LogoPegasus Docs

Partitions

Partitions #

Our cluster is organized in partitions by GPU type. Add -p [name] or --partition [name] to your srun command:

  • A100-40GB
  • A100-80GB
  • A100-PCI
  • H100
  • H200
  • L40S
  • RTX3090
  • RTXA6000
  • V100-16GB
  • V100-32GB
  • batch

You can use the partition to specify which kind of GPU, if any, your job requires.

There are also sub-partitions for the group that contributed the node to the cluster. Only users from that group can use these sub-partitions and jobs scheduled there have a higher priority.

Resources in partitions #

Values below like CPUs and memory per GPU are based on evenly distributing all available resources of a node. While it is reasonable to start with this amount at first, please consider that some jobs may require more. If everyone requests only the necessary amount of CPUs and memory we can ensure that more demanding jobs can run as well. Monitor your jobs to find out what they actually require and add a margin for safety. The resources dashboard in particular will tell you what is currently available.

Legend: 🚌GPU, πŸš—CPU, 🚚Mem
Partition🚌 Name🚌 Arch🚌 Mem (GB)🚌 per NodeπŸš— per 🚌🚚 per 🚌 (GB)TimeLimit
Default - Max
A100-40GB
A100-80GB
A100-SXM4Ampere40
80
832112
224
1 - 3 days
A100-PCIA100-PCIEAmpere40812481 - 3 days
H100H100-SXM5Hopper808282241 - 1 days
H200H200-SXM5Hopper1418282241 - 1 days
L40SL40SAda Lovelace488161251 - 3 days
RTX3090RTX 3090Ampere24812641 - 3 days
RTXA6000RTX A6000Ampere488121081 - 3 days
V100-16GBV100-SXM2Volta16810641 - 3 days
V100-32GBV100-SXM2Volta32810.641 - 3 days
batchRTX 6000Turing24107641 - 3 days
Group sub-partitions have the same default time limit, but no maximum.

Use cases & connectivity #

Here is some more info on how GPUs and nodes are connected in each partition. NVLink or NVSwitch speeds up communication between GPUs in the same node. InfiniBand improves communication between nodes.

Some jobs can work perfectly fine using PCIe / Ethernet as interconnect, while others may run significantly slower. As a rule of thumb, more GPUs and iterations/s (increased synchronization overhead) means the impact will be higher.
Partition🚌 LinkInfiniBandComments
A100-40GB
A100-80GB
NVSwitchyesbest for multi-gpu, Infiniband for multi-node Jobs, needs image version 20.06 or newer
A100-PCIPCIenogood for single-node multi-GPU, needs image version 20.10 or newer
H100NVSwitchnolatest architecture, best for multi-gpu, needs image version 22.09 or newer
H200NVSwitchnolatest architecture, best for multi-gpu, needs image version 22.09 or newer
L40SPCIenogood for single-node multi-GPU, needs image version 22.09 or newer
RTX3090PCIenogood for single-node multi-GPU, needs image version 20.10 or newer
RTXA6000PCIenogood for single-node multi-GPU, needs image version 20.10 or newer
V100-16GB
V100-32GB
NVLinkyesNVLink for multi-gpu, InfiniBand for multi-node jobs
batchPCIenodefault partition, good for single-GPU jobs, OK otherwise

Performance #

Here are some benchmarks for ImageNet training. Throughput is measured in images/s using one entire node in each partition and the PyTorch 20.10 container (H100/200 & GH200 PyTorch Container 24.05).

Partition🚌Batch sizeThroughput (images/s)
batch81922250
batch101922835
V10082562900
V100 (node gera)162566234
A10082566500
A1041601162
H100825612879
H200825613472
L40S82564500
GH20012562044
RTX309081923300
RTXA600081923450
RTXA600083843550

Benchmark results for Transformer network training. Throughput is measured in tokens/s using one entire node in each partition and the PyTorch 21.05 container.

Partition🚌Batch sizeThroughput (tokens/s)Data type
batch10512063.000FP32
V1008512065.000FP32
RTX30908512029.500FP32
RTX30908512032.500TF32
RTXA600081024080.000FP32
RTXA6000810240150.000TF32
A100-40GB81024090.000FP32
A100-40GB810240326.000TF32
A100-80GB81024090.000FP32
A100-80GB82048092.000FP32
A100-80GB810240341.000TF32
A100-80GB820480360.000TF32

Partition status #

Check the resources dashboard for the most up-to-date info on available resources.

On head nodes, you can also run sinfo to see a list of available partitions. This is just an example of what its output looks like. The current situation is almost certainly different.

$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
batch*         up   infinite      3    mix kent,kersey,kusel
V100-32GB      up   infinite      4    mix garda,gera,gifu,glendale
V100-16GB      up   infinite      1    mix glasgow
A100-40GB      up   infinite      5    mix serv-[3328-3332]
Note: Use this extended sinfo command to get a bit more info:
sinfo -o "| %15P | %15f | %8c | %10m | %15G | %10O | %8t | %N"