DFKI / MADM Deep Learning Slurm Cluster #
We use Slurm to manage compute resources in the cluster, and schedule and run jobs on worker nodes. You can read the quickstart guide if you want a more in-depth description of how Slurm works and how to use it. What follows is a very brief introduction to Slurm with some additions specific to our cluster.
Hello world #
If you are new to Slurm, here is a simple command that you can execute to run your first job:
srun echo "hello world!"
It may not look like much, but this simple command actually
did a lot of things in the background.
Slurm created a job with a unique ID,
reserved some resources on a worker node,
executed the echo
command on said node,
and sent the output back to your shell.
To make it do useful things, you now add more options onto
your srun
command as outlined in the rest of this guide.