Get an account on the DGX. IT will let you know the DNS or the IP address or whatever you need and give you a username and password. Log in via SSH (or putty), transfer some files about with sftp (or filezilla or something non-commandline if you prefer).
Find a DockerHub image that has the stuff you want in it. I chose an image with jupyter and tensorflow-gpu. Pull it from DockerHub using something like this:
docker pull tensorflow/tensorflow:latest-gpu-jupyter
Note that you could just pick a vanilla Ubuntu image and then step inside it and install everything that you require. Using a pre-built image is convenient in some ways (you don't have to install everything) and inconvenient in others (it decides which folder it initialises in and doesn't seem to want to change it).
Run the image with a command line, something like this:
docker run -ti --name pamela --gpus=all -v /home/user/dev:/tf/mydev -p 8888:8888 -w /home/user/dev tensorflow/tensorflow:latest-gpu-jupyter
You used to need nvidia-docker
to enable the GPU stuff, but now you either have --gpus=all
or --runtime=nvidia
to tell the container about the GPUs.
I use --name pamela
to give the container the name "pamela" so I can identify it later.
I use -v
to map a volume. My folder "/home/user/dev" exists on the host machine, and it's in my home directory. The folder "/tf/mydev" exists inside the docker container. Mapping a volume like this allows me to get files from within the container onto the host machine itself and vice versa. Note that here, I've used "/tf/mydev". This is because the particular image I used initialises the Jupyter server in the "/tf" directory, so I've mounted a directory of mine as a subdirectory of that.
I use -p
to map the ports. By default, Jupyter will initialise on port 8888. You can map it to port 8888 on the DGX, but it might be best to check if port 8888 is busy or not first. Port 8080 is also a good port to use for this.
I set the working directory with -w, but it doesn't seem to have any effect on this particular image - the jupyter notebook server is always started in the same directory ("/tf").
Note that I figured out which folder the jupyter server is in by doing pwd
on the notebook commandline.
You can also add bash
to the end of the command line to get into the container itself, and if you don't set the working directory, it'll take you to /tf.
Access the notebooks. On your own machine, go to :/<some token). When the container starts up on the DGX, it will give you the address to visit, including a token. This is the password security provided by the image. Include it when you open the notebook in your browser.
Save your files. Remember to save any ipynb files to the directory that you mapped in Step 3. In my case, it's dir/mydev. This means that your notebooks are independent of your container and you can pull them off the DGX using sftp.
Tidy up. If you are (completely) done with your container, remember to use docker stop <container name>
and docker rm <container name>
to remove the stopped container.
You can use ctrl+(p,q)
to step out of the container without stopping it and look at the host machine without multiple log-ins. Use docker attach <container name>
to step back in. Check that the host machine is running your code on a GPU using nvidia-smi
, in my case, it ran the code on all the GPUs (which you can tell because the memory on every GPU was more than 0 and every GPU had a process associated with it).
So, if you follow the above commands, as soon as you start running a python notebook it takes up all the GPUs on the DGX. This is ok if nobody is using it, but it is a shared resource, so it's something to avoid. Here is a way around it.
docker inspect -f '{{.Config.Cmd}}' tensorflow/tensorflow:latest-gpu-jupyter
This command allows you to find out the commands that the image actually runs when you start up the container (replace "tensorflow/tensorflow:latest-gpu-jupyter" with any other image name). In this case, it runs:
bash -c source /etc/bash.bashrc && jupyter notebook --notebook-dir=/tf --ip 0.0.0.0 --no-browser --allow-root
This means that you can get the same effect by running:
docker run -ti --name pamela --gpus=all -v /home/user/dev:/tf/mydev -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter bash
And then running:
jupyter notebook --notebook-dir=/tf --ip 0.0.0.0 --no-browser --allow-root
from inside the container. And this allows you to set different settings for jupyter (i.e. setting a different port number, although the actual port is selected by the port mappings with the -p
option in nvidia-docker run). So, for example, you could cd
to a specific directory (if you've mounted a specific directory and want to run the jupyter server from there). And you could install stuff in your image (if, say, you wanted to use OpenCV or other Python packages that aren't already installed in the container).
Most importantly, this means that you can restrict the DGX to running on a single GPU by running (inside the container):
CUDA_VISIBLE_DEVICES=7 jupyter notebook --notebook-dir=/tf --ip 0.0.0.0 --no-browser --allow-root
In this case, I've opted for GPU number 7, but values 0,1,2,3,4,5,6 are also available on the DGX. Note that inside the container, the GPUs will have their own names. The python notebook test.ipynb has some code to look at that.
Remember that ctrl+(p,q)
lets you step out of the container without stopping it and docker attach <container name>
lets you step back inside.
Tensorflow versions of docker and how to pull them from dockerhub and run them. Docker run options. Running a Dockerized Jupyter Server for Data Science.