Suggestion to use `cpus-per-gpu` instead of `ntasks` in slurm docs #425

WPoelman · 2024-08-09T18:34:38Z

In the documentation all examples use ntasks or n to specify the number of CPUs needed per GPU. This generally works fine, but external tools (such as submitit ) have a specific interpretation of ntasks, which can lead to issues. It might be better to explicitly use the cpus-per-gpu slurm option in the examples to avoid such issues. The options both work identically in my tests requesting GPUs on the debug node and on wice.

The text was updated successfully, but these errors were encountered:

moravveji · 2024-08-13T10:20:51Z

This is an interesting one. Few remarks:

Like always, there are different ways to request the same resource (TRES and GRES) in Slurm, using the supported directives. On VSC docs, we have opted for the most obvious and generic ones to cover the majority of the use cases on our clusters.
A black-belt user knows best how to exploit the resource specifications using more fine-grained options e.g. from the official sbatch documentation. So, we leave off such cases from the official VSC docs, because the Slurm docs are just there. Also the audience for such specialized use cases are in minority.
Because of a specific interpretation of a package (submitit in this case), we are not gonna tune the VSC docs and our Slurm configurations. It is actually the other way around: the third-party software which uses the underlying scheduler needs to align his interpretation of the Slurm job submit options/directives with the original ones as documented on Slurm docs.
To me, the --cpus-per-gpu is a useful option for multi-GPU jobs, when an advanced user wants to take full control over process distribution. For the single-GPU jobs, it does not offer much of added values. Take the following two-node GPU example:
```
srun -A <account> -M genius --nodes=2 --ntasks=8 --cpus-per-gpu=1 --gpus-per-node=4 --pty bash -l
```
And you immediately see how transparent it is to specify the --cpus-per-gpu option.

So, if you have another comment or question, please let us know. Else, we can perhaps close this issue item.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion to use `cpus-per-gpu` instead of `ntasks` in slurm docs #425

Suggestion to use `cpus-per-gpu` instead of `ntasks` in slurm docs #425

WPoelman commented Aug 9, 2024

moravveji commented Aug 13, 2024

Suggestion to use cpus-per-gpu instead of ntasks in slurm docs #425

Suggestion to use cpus-per-gpu instead of ntasks in slurm docs #425

Comments

WPoelman commented Aug 9, 2024

moravveji commented Aug 13, 2024

Suggestion to use `cpus-per-gpu` instead of `ntasks` in slurm docs #425

Suggestion to use `cpus-per-gpu` instead of `ntasks` in slurm docs #425