Open main menu

Humanoid Robots Wiki β

Changes

K-Scale Cluster

1,784 bytes added, 25 May
SLURM Commands
* There are 8 nodes of that type, all currently in <code>idle</code> state
* The node names are things like <code>compute-permanent-node-68</code>
 
To launch a job, use [https://slurm.schedmd.com/srun.html srun] or [https://slurm.schedmd.com/sbatch.html sbatch].
 
* '''srun''' runs a command directly with the requested resources
* '''sbatch''' queues the job to run when resources become available
 
For example, suppose I have the following Shell script:
 
<syntaxhighlight lang="bash">
#!/bin/bash
 
echo "Hello, world!"
 
nvidia-smi
</syntaxhighlight>
 
I can use <code>srun</code> to run this script with the following result:
 
<syntaxhighlight lang="bash">
$ srun --gpus 8 ./test.sh
Hello, world!
Sat May 25 00:02:23 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
 
... truncated
</syntaxhiglight>
 
Alternatively, I can queue the job using <code>sbatch</code>, which gives me the following result:
 
<syntaxhighlight lang="bash">
$ sbatch test.sh
Submitted batch job 461
</syntaxhighlight>
 
After launching the job, we can see it running using our original <code>squeue</code> command:
 
<syntaxhighlight lang="bash">
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
461 compute test.sh ben R 0:37 1 compute-permanent-node-285
</syntaxhighlight>
==== Reserving a GPU ====
431
edits