Using sbatch instead of srun #
Currently, Enroot / Pyxis only integrates with srun.
You can, however, create your sbatch script as you
would normally do and make it internally call srun with the
required container-specific options for you:
Make sure that no actual command appears within the commented SBATCH preamble!
#!/bin/bash
# let's set the following defaults (can be overriden on commandline):
#SBATCH --job-name sbatch_test
#SBATCH --partition batch
# put your srun command with args here
srun \
--container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
--container-workdir="`pwd`" \
--container-mounts=/netscratch:/netscratch,/ds:/ds:ro,"`pwd`":"`pwd`" \
echo "hello world!"
Finally, submit your batch job (sbatch [script]).
Output is by default saved to a slurm-JOBID.out file in the current directory.
See sbatch documentation for more details.
Job Arrays #
One very handy feature that sbatch supports which isn’t available for srun is job arrays.
These are especially useful if you want to run a whole series of jobs / experiments (e.g., for different
hyperparameters / one run per input file).
#!/bin/bash
# let's set the following defaults (can be overriden on commandline):
#SBATCH --array 0-4%3
#SBATCH --job-name sbatch_array_test
#SBATCH --partition batch
srun \
--container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
--container-workdir="`pwd`" \
--container-mounts=/netscratch:/netscratch,/ds:/ds:ro,"`pwd`":"`pwd`" \
echo "hello world! array index: $SLURM_ARRAY_TASK_ID"
Finally, submit your batch job (sbatch [script]).
The example runs 5 jobs in total (0-4), taking care to run at most 3 in parallel %3.
(One can also run arrays in step-sizes (e.g., :7).)
As you can see, the job can access the array index via the $SLURM_ARRAY_TASK_ID env var.
Output is by default saved to a slurm-JOBID-TASKID.out file in the current directory.
See sbatch documentation for more details.
One run per input file #
When using job arrays, the following bash pattern might be useful to run a script once per input-file:
#SBATCH --array=0-42
FILES=(
somedir/*.csv.gz
)
srun ... my_script.py FILES[$SLURM_ARRAY_TASK_ID]
Wrap #
If you don’t want to create a separate script file, you can also
use the --wrap option to create a (kind of lengthy and cumbersome)
command line only version:
sbatch \
--array "0-4%3" --job-name sbatch_test --partition batch \
--wrap "srun \
--container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
--container-workdir=\"`pwd`\" \
--container-mounts=/netscratch:/netscratch,/ds:/ds:ro,\"`pwd`\":\"`pwd`\" \
echo \"hello world! array index: \$SLURM_ARRAY_TASK_ID\""
You need to be a bit more careful with parameter expansion though (note the escaping of the last var, so it’s not expanded at submit time).