-
Notifications
You must be signed in to change notification settings - Fork 2
Exx Slurm Setup
This guide's purpose is to give a quick overview on how to install Slurm and Docker on compute servers.
To manage jobs on the Exx server, we use Slurm Workload Manager. I allows to schedule jobs and manage resources like GPU.
Here's a short introduction to Slurm by Bull.
To install Slurm, we follow the instructions from slothparadise. Note that you may need this line to a free number:
MUNGEUSER=991
Use Slurm Configuration Tool to generate configuration files. Here's the parameters we used:
Put the generated configurations into /etc/slurm.
Generate a cgroup configuration file from the example:
cp cgroup.conf.example cgroup.conf
Configurations for Exx are stored here: willGuimont/exx_slurm_config
https://gist.github.com/DaisukeMiyamoto/d1dac9483ff0971d5d9f34000311d312 https://slurm.schedmd.com/accounting.html#mysql-configuration
https://slurm.schedmd.com/accounting.html#mysql-configuration https://slurm.schedmd.com/accounting.html#database-configuration
sacctmgr add user brian Account=norlab
https://www.mail-archive.com/[email protected]/msg04744.html
Ldd nvml
Install Docker Engine and follow the post-installation steps for Linux to allow non-sudo users to use docker.
Start docker on boot:
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum clean expire-cache
sudo yum install nvidia-container-toolkit -y
Verify that you can see GPUs from docker containers:
docker run --rm --gpus all -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:11.0-base nvidia-smi
Add those lines to the cronjob file with crontab -e
:
0 9 * * 3 docker network prune
30 9 * * 3 docker container prune
useradd -c 'Full name' -m <username> -G docker
Add the following line to the new user's .bashrc
:
PATH=$PATH:/usr/local/bin
Add the user to account manager:
sacctmgr add user <username> Account=norlab
Add an example job example_job.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --time=4-00:00
#SBATCH --job-name=ExampleJob
#SBATCH --output=%x-%j.out
docker run --rm bash -c "echo 'working from slurm'"
- Warthog Teach and Repeat (ROS1)
- Warthog Teach and Repeat (ROS2)
- Time Synchronization
- Deployment of Robotic Total Stations (RTS)
- Deployment of the backpack GPS
- Warthog Emlid GPS
- Atlans-C INS
- How to use a CB Radio when going in the forest
- IP forwarding
- Emlid Data Postprocessing (PPK)
- Lessons Learned
- Robots' 3D Models
- Order Management
- Fast track Master → PhD
- Intellectual Property
- Repository Guidelines
- TF Cheatsheet
- Montmorency Forest Wintertime Dataset
- RTS-GT Dataset 2023
- Deschenes2021 Dataset
- TIGS Dataset
- DRIVE Datasets
- BorealHDR
- TimberSeg 1.0
- DARPA Subterranean Challenge - Urban Dataset
- How to upload a dataset to VALERIA
- ROS1 Bridge
- Migrating a repository to ROS2 (Humble)
- ROS2 and rosbags
- MCAP rosbags
- DDS Configuration (work in progress)
- Using a USB Microphone with ROS2
- ROS2 in VSCode
- ROS2 Troubleshooting