Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split release and source dockerfile #1178

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ on:
heat_version:
description: 'Heat version'
required: true
default: '1.2.2'
default: 'latest'
type: string
pytorch_img:
description: 'Base PyTorch Img'
required: true
default: '23.03-py3'
default: '23.05-py3'
type: string
name:
description: 'Output Image name'
required: true
default: 'heat:1.2.2_torch1.13_cu12.1'
default: 'heat:1.3.0_torch2.0.0_cu12.1'
type: string
jobs:
build-and-push-img:
Expand Down Expand Up @@ -43,7 +43,7 @@ jobs:
name: Build
uses: docker/build-push-action@v4
with:
context: docker/
file: docker/Dockerfile.release
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
Expand All @@ -59,7 +59,7 @@ jobs:
name: Build and push
uses: docker/build-push-action@v4
with:
context: docker/
file: docker/Dockerfile.release
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
Expand Down
21 changes: 0 additions & 21 deletions docker/Dockerfile

This file was deleted.

18 changes: 18 additions & 0 deletions docker/Dockerfile.release
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ARG HEAT_VERSION=latest
ARG PYTORCH_IMG=23.05-py3

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS release-install
ARG HEAT_VERSION
RUN pip install --upgrade pip
RUN pip install mpi4py --no-binary :all:
RUN echo ${HEAT_VERSION}
RUN if [[ ${HEAT_VERSION} =~ ^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$ ]]; then \
pip install heat[hdf5,netcdf]==${HEAT_VERSION}; \
else \
pip install heat[hdf5,netcdf]; \
fi
13 changes: 13 additions & 0 deletions docker/Dockerfile.source
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
ARG PYTORCH_IMG=23.05-py3
ARG HEAT_BRANCH=main

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS source-install
ARG HEAT_BRANCH
RUN pip install --upgrade pip
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git
RUN pip install mpi4py --no-binary :all: && pushd heat && pip install .[hdf5,netcdf] && popd && rm -rf heat
24 changes: 16 additions & 8 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,27 @@

There is some flexibility to building the Docker images of Heat.

Firstly, one can build from the released version taken from PyPI. This will either be
the latest release or the version set through the `--build-arg=HEAT_VERSION=X.Y.Z`
Firstly, one can build from the released version taken from PyPI using `Dockerfile.release`. This will either be
the latest release or the version set through the `--build-arg HEAT_VERSION=X.Y.Z`
argument.

Secondly one can build a docker image from the GitHub sources, selected through
`--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other
branches can be specified using `--build-arg=HEAT_BRANCH=<branch-name>`.
Secondly one can build a docker image from the GitHub sources, by building using `Dockerfile.source`. The default branch to be built is main, other
branches can be specified using `--build-arg HEAT_BRANCH=<branch-name>`.

## General build

### Docker

The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located in this
directory. It is typically most convenient to `cd` over here and run the Docker build as:
The [Dockerfile](./Dockerfile.release or ./Dockerfile.source) guiding the build of the Docker image is located in this directory. It is typically most convenient to `cd` to the `docker` directory and run the build command as:

```console
$ docker build --build-args HEAT_VERSION=X.Y.Z --PYTORCH_IMG=<nvcr-tag> -t heat .
$ docker build -t heat:latest -f Dockerfile.source .
```

Or optionally, using a particular version and pytorch base image:

```console
$ docker build --build-arg HEAT_VERSION=X.Y.Z --build-arg PYTORCH_IMG=<nvcr-tag> -t heat:X.Y.Z -f Dockerfile.release .
```

The heat image is based on the nvidia pytorch container. You can find exisiting tags in the [nvidia container catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags).
Expand Down Expand Up @@ -89,3 +93,7 @@ The following file can be used as an example to use the apptainer file together

srun --mpi="pmi2" apptainer exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py"
```

## Scripts

The scripts folder has a small collection of helper scripts to automate certain tasks, primarly meant for heat developers. Explanations are given at the top of the script.
68 changes: 68 additions & 0 deletions docker/scripts/build_and_push.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/bin/bash
### As the name suggests, this script is meant for the HeAT developers to quickly build a new Docker image with the specified HeAT version, and Pytorch IMG version. The arguments TORCH_VERSION, CUDA_VERSION, and PYTHON_VERSION should indicated the versions of thouse libraries found on the pytorch image from nvidia, and used only to create the image tag.
# If you want to upload the image to the github package registry, use the '--upload' option. You need be logged in to the registry. Instructions here: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry

GHCR_UPLOAD=false

while [[ $# -gt 0 ]]; do
case $1 in
--heat-version)
HEAT_VERSION="$2"
shift # past argument
shift # past value
;;
--pytorch-img)
PYTORCH_IMG="$2"
shift # past argument
shift # past value
;;
--torch-version)
TORCH_VERSION="$2"
shift # past argument
shift # past value
;;
--cuda-version)
CUDA_VERSION="$2"
shift # past argument
shift # past value
;;
--python-version)
PYTHON_VERSION="$2"
shift # past argument
shift # past value
;;
--upload)
GHCR_UPLOAD=true
shift
shift
;;
-*|--*)
echo "Unknown option $1"
exit 1
;;
*)
esac
done

echo "HEAT_VERSION=$HEAT_VERSION"
echo "PYTORCH_IMG=$PYTORCH_IMG"
echo "TORCH_VERSION=$TORCH_VERSION"
echo "CUDA_VERSION=$CUDA_VERSION"
echo "PYTHON_VERSION=$PYTHON_VERSION"


ghcr_tag="ghcr.io/helmholtz-analytics/heat:${HEAT_VERSION}_torch${TORCH_VERSION}_cu${CUDA_VERSION}_py${PYTHON_VERSION}"

echo "Building image $ghcr_tag"

docker build --file ../Dockerfile.release \
--build-arg HEAT_VERSION=$HEAT_VERSION \
--build-arg PYTORCH_IMG=$PYTORCH_IMG \
--tag $ghcr_tag \
.

if [ $GHCR_UPLOAD = true ]; then
echo "Push image"
echo "You might need to log in into ghcr.io (https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry)"
docker push $ghcr_tag
fi
21 changes: 21 additions & 0 deletions docker/scripts/install_print_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
# Scripts to quickly obtain all relevant information out of a new nvidia pytorch container. Run it inside a pytorch container from nvidia and it will first print the software stack (cuda version, torch version, ...), install heat from source, and run the heat unit tests. Usefull to quickly check if a container is compatible with heat.

# Container setup
apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

# Print environment
pip list | grep torch
python --version
nvcc --version
mpirun --version

# Install heat from source.
git clone https://github.com/helmholtz-analytics/heat.git
cd heat
pip install --upgrade pip
pip install mpi4py --no-binary :all:
pip install .[netcdf,hdf5,dev]

# Run tests
HEAT_TEST_USE_DEVICE=gpu mpirun -n 1 pytest heat/
19 changes: 19 additions & 0 deletions docker/scripts/test_nvidia_image_haicore_enroot.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash
# Example SLURM/ENROOT script. It will mount the container using enroot, and then run the test script to test the compatibility of the image with the source version of heat.

# Clear environment, else mpi4py will fail to install.
ml purge

SBATCH_PARAMS=(
--partition normal
--time 00:10:00
--nodes 1
--tasks-per-node 1
--gres gpu:1
--container-image ~/containers/nvidia+pytorch+23.05-py3.sqsh
--container-writable
--container-mounts /etc/slurm/task_prolog.hk:/etc/slurm/task_prolog.hk,/scratch:/scratch
--container-mount-home
)

sbatch "${SBATCH_PARAMS[@]}" ./install_print_test.sh
2 changes: 1 addition & 1 deletion docker/singularity-dockerfile.sample
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This is a sample file to use with the Singularity image builder
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.11_cuda11.5_py3.9
FROM ghcr.io/helmholtz-analytics/heat:1.3.0_torch1.12_cuda11.7_py3.8
2 changes: 1 addition & 1 deletion quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ or build it from our Dockerfile
```
git clone https://github.com/helmholtz-analytics/heat.git
cd heat/docker
docker build --build-args HEAT_VERSION=X.Y.Z --PYTORCH_IMG=<nvcr-tag> -t heat:latest .
docker build --build-arg HEAT_VERSION=X.Y.Z --build-arg PYTORCH_IMG=<nvcr-tag> -t heat:X.Y.Z .
```

`<nvcr-tag>` should be replaced with an existing version of the official Nvidia pytorch container image. Information and existing tags can be found on the [here](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
Expand Down
Loading