Skip to content

Latest commit

 

History

History
135 lines (76 loc) · 4.72 KB

gpus_with_notebooks_on_hpc.md

File metadata and controls

135 lines (76 loc) · 4.72 KB

Python with GPUs and Jupyter notebooks on Ginsburg (HPC)

Gisburg has a the following GPU resources:

  • 18 GPU Nodes with two Nvidia RTX 8000 GPU modules
  • 4 GPU Nodes with two Nvidia V100S GPU modules for being able to run computationally expensive tasks, like training ML models.

However, being able to use these resources with the latest ML libraries (Tensorflow, Pytorch, etc) and datascience tools (like jupyter notebooks) requires us to setup the proper computational environment.

Here we list the installation instructions for getting started with doing GPU enabled ML on ginsburg:

Step 1: Install miniconda

We need to install miniconda.

Note 1: There is already a conda installed on ginsburg, but as it is controlled by root we don't have all the freedom that is required to be able to customize. So we install our own.

Note 2: We install miniconda on the scratch directory (e.g. /burg/abernathey/users/db3157/) because ginsburg has a limit on the files in the home directoy, and the large number of files generated by conda environments fills it up quickly.

cd <personal scratch dir> 

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ./miniconda3/miniconda.sh

bash ./miniconda3/miniconda.sh -b -u -p ./miniconda3

rm -rf ./miniconda3/miniconda.sh

./miniconda3/bin/conda init bash
./miniconda3/bin/conda init zsh

Step 2: Install mamba

For some reason conda has gotten extremely slow in the last few years. mamba solves this. Install mamba by using:

conda install mamba -n base -c conda-forge

Mamba works exactly like conda in practice, and we just need to replace the word conda with mamba when we want to use it.

Step 3: Install jupyter on base

We will install jupyter and jupyter lab in the base environment, which will be used to run notebooks.

mamba install jupyter jupyterlab -n base

Step 4: Install the environment

Install a new environment with the dependencies that are needed. Here we do this using an environment.yml file, with the following contents:

name: tf_gpu_ml
channels:
  - conda-forge
dependencies:
  - python
  - numpy
  - scipy
  - jupyter
  - jupyterlab
  - xarray
  - tensorflow-gpu

This environment can be created as

mamba env create --file environment.yml

Note 1: More details on many steps can be found in https://github.com/ocean-transport/guides/blob/master/Setting_up_conda_on_clusters.md

Note 2: You might need to use pip if you want the absolute newest version of tensorflow, but similar procedure would apply.

Step 5: Point kernel to new environment

Often on HPC systems, we need to explicitly point the jupyter kernel to a new environment (like the one created above). We can do this by

First activate the new environment:

conda activate tf_gpu_ml

Then create the kernelspec

jupyter kernelspec install-self --user

Rename the python kernel you created. This way it does not override the default kernel.

$ cd ~/.local/share/jupyter/kernels 
$ vim /kernel.json

Change the display_name, for example to:

"display_name": "tf_gpu_ml",

Then rename the directory, as well...

mv python3 tf_gpu_ml

Now the new kernel should be ready to use on jupyter, so next we will learn how to use jupyter notebooks.

Note: More details on this can be found in https://github.com/ocean-transport/guides/blob/master/adding_jupyter_kernels.md

Step 6: Start jupyter on an interactive node

This can be done by first requesting an interactive node with gpu support.

srun --pty -t 60:00 -A abernathey --gres=gpu:1 /bin/bash

Then we mostly follow the step from ginsburg doc to start and access jupyter notebooks.

unset XDG_RUNTIME_DIR
hostname -i 

outputs something like 10.43.4.206

jupyter notebook --no-browser --ip=10.43.4.206

This command will start jupyter and tell you the location it is running, e.g http://10.43.4.206:8888/.

Then you can access this notebook server from your local machine by

ssh -L 8080:10.43.4.206:8888 [email protected]

Followed by opening localhost:8080 on a browser.

Note: that by default this will open a jupyter notebook, but can be switched over to lab by changing the word tree to lab in the url.

Step 7: Check if gpu support is present

You can check if gpu support was properly activated by using following commands in Python or jupyter notebook.

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

This should output something that says that 1 gpu is available, since we had requested 1 in step 6.