LogoPegasus Docs

Batch jobs

Using sbatch instead of srun #

Currently, Enroot / Pyxis only integrates with srun. You can, however, create your sbatch script as you would normally do and make it internally call srun with the required container-specific options for you:

Make sure that no actual command appears within the commented SBATCH preamble!
#!/bin/bash
# let's set the following defaults (can be overriden on commandline):
#SBATCH --job-name sbatch_test
#SBATCH --partition batch

# put your srun command with args here
srun \
  --container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
  --container-workdir="`pwd`" \
  --container-mounts=/netscratch:/netscratch,/ds:/ds:ro,"`pwd`":"`pwd`" \
  echo "hello world!"

Finally, submit your batch job (sbatch [script]).

Output is by default saved to a slurm-JOBID.out file in the current directory.

See sbatch documentation for more details.

Job Arrays #

One very handy feature that sbatch supports which isn’t available for srun is job arrays. These are especially useful if you want to run a whole series of jobs / experiments (e.g., for different hyperparameters / one run per input file).

#!/bin/bash
# let's set the following defaults (can be overriden on commandline):
#SBATCH --array 0-4%3
#SBATCH --job-name sbatch_array_test
#SBATCH --partition batch

srun \
  --container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
  --container-workdir="`pwd`" \
  --container-mounts=/netscratch:/netscratch,/ds:/ds:ro,"`pwd`":"`pwd`" \
  echo "hello world! array index: $SLURM_ARRAY_TASK_ID"

Finally, submit your batch job (sbatch [script]).

The example runs 5 jobs in total (0-4), taking care to run at most 3 in parallel %3. (One can also run arrays in step-sizes (e.g., :7).)

As you can see, the job can access the array index via the $SLURM_ARRAY_TASK_ID env var. Output is by default saved to a slurm-JOBID-TASKID.out file in the current directory.

See sbatch documentation for more details.

One run per input file #

When using job arrays, the following bash pattern might be useful to run a script once per input-file:

#SBATCH --array=0-42
FILES=(
    somedir/*.csv.gz
)
srun ... my_script.py FILES[$SLURM_ARRAY_TASK_ID]

Wrap #

If you don’t want to create a separate script file, you can also use the --wrap option to create a (kind of lengthy and cumbersome) command line only version:

sbatch \
  --array "0-4%3" --job-name sbatch_test --partition batch \
  --wrap "srun \
    --container-image=/enroot/nvcr.io_nvidia_pytorch_23.12-py3.sqsh \
    --container-workdir=\"`pwd`\" \
    --container-mounts=/netscratch:/netscratch,/ds:/ds:ro,\"`pwd`\":\"`pwd`\" \
    echo \"hello world! array index: \$SLURM_ARRAY_TASK_ID\""

You need to be a bit more careful with parameter expansion though (note the escaping of the last var, so it’s not expanded at submit time).