Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: 0.38.0 environment images #10197

Merged
merged 1 commit into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ commands:
- when:
condition: <<parameters.tf2>>
steps:
- run: docker pull determinedai/pytorch-ngc-dev:0736b6d
- run: docker pull determinedai/pytorch-ngc:0.38.0

login-docker:
parameters:
Expand Down Expand Up @@ -2479,7 +2479,7 @@ jobs:

test-unit-harness-gpu-tf:
docker:
- image: determinedai/tensorflow-ngc-dev:0736b6d
- image: determinedai/tensorflow-ngc:0.38.0
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2506,7 +2506,7 @@ jobs:

test-unit-harness-pytorch2-gpu:
docker:
- image: determinedai/pytorch-ngc-dev:0736b6d
- image: determinedai/pytorch-ngc:0.38.0
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2533,7 +2533,7 @@ jobs:

test-unit-harness-pytorch2-cpu:
docker:
- image: determinedai/pytorch-ngc-dev:0736b6d
- image: determinedai/pytorch-ngc:0.38.0
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
- checkout
Expand All @@ -2559,7 +2559,7 @@ jobs:

test-unit-harness-gpu-parallel:
docker:
- image: determinedai/pytorch-ngc-dev:0736b6d
- image: determinedai/pytorch-ngc:0.38.0
resource_class: determined-ai/container-runner-multi-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2586,7 +2586,7 @@ jobs:

test-unit-harness-gpu-deepspeed:
docker:
- image: determinedai/pytorch-ngc-dev:0736b6d
- image: determinedai/pytorch-ngc:0.38.0
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand Down Expand Up @@ -3648,7 +3648,7 @@ jobs:
type: string
default: "1"
environment-image:
default: determinedai/pytorch-ngc-dev:0736b6d
default: determinedai/pytorch-ngc:0.38.0
type: string
accel-node-taints:
type: string
Expand Down
2 changes: 1 addition & 1 deletion .circleci/scripts/pull_image_daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
spec:
containers:
- name: pull-docker-daemonset
image: determinedai/pytorch-ngc-dev:0736b6d
image: determinedai/pytorch-ngc:0.38.0
command: ["/bin/bash"]
args: ["echo", "test"]
resources:
Expand Down
8 changes: 4 additions & 4 deletions docs/model-dev-guide/prepare-container/custom-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,9 @@ Default Images
- - Environment
- File Name
- - CPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - NVIDIA GPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - AMD GPUs
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4``

Expand Down Expand Up @@ -155,7 +155,7 @@ Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based
.. code:: bash

# Determined Image
FROM determinedai/tensorflow-ngc-dev:0736b6d
FROM determinedai/tensorflow-ngc:0.38.0

# Custom Configuration
RUN apt-get update && \
Expand Down Expand Up @@ -216,7 +216,7 @@ environments using :ref:`custom images <custom-docker-images>`:
.. code:: bash

# Determined Image
FROM determinedai/pytorch-ngc-dev:0736b6d
FROM determinedai/pytorch-ngc:0.38.0

# Create a virtual environment
RUN conda create -n myenv python=3.8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Determined supports both TensorFlow 1 and 2. The version of TensorFlow used for
experiment is controlled by the configured container image. Determined provides prebuilt Docker
images that include TensorFlow 2+, 1.15, and 2.8, respectively:

- ``determinedai/tensorflow-ngc-dev:0736b6d``
- ``determinedai/tensorflow-ngc:0.38.0``
- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1``

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/helm-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,13 +239,13 @@

- ``cpuImage``: Sets the default Docker image for all non-GPU tasks. If a Docker image is
specified in the :ref:`experiment config <exp-environment-image>` this default is overriden.
Defaults to: ``determinedai/pytorch-ngc-dev:0736b6d``.
Defaults to: ``determinedai/pytorch-ngc:0.38.0``.

- ``startupHook``: An optional inline script that will be executed as part of task set up.

- ``gpuImage``: Sets the default Docker image for all GPU tasks. If a Docker image is specified
in the :ref:`experiment config <exp-environment-image>` this default is overriden. Defaults
to: ``determinedai/pytorch-ngc-dev:0736b6d``.
to: ``determinedai/pytorch-ngc:0.38.0``.

- ``logPolicies``: Sets log policies for trials. For details, visit :ref:`log_policies
<config-log-policies>`.
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/master-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@ configure different container images for NVIDIA GPU tasks using the ``cuda`` key
Determined 0.17.6), CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using the ``rocm`` key.
Default values:

- ``determinedai/pytorch-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/pytorch-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.

For TensorFlow users, we provide an image that must be referenced in the experiment configuration:

- ``determinedai/tensorflow-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/tensorflow-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.

``environment_variables``
=========================
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1294,12 +1294,12 @@ Optional. The Docker image to use when executing the workload. This image must b
container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6), CPU tasks using
``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/pytorch-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/pytorch-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.

For TensorFlow users, we provide an image that must be referenced in the experiment configuration:

- ``determinedai/tensorflow-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/tensorflow-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.

When the cluster is configured with :ref:`resource_manager.type: slurm
<cluster-configuration-slurm>` and ``container_run_type: singularity``, images are executed using
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/job-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@ The following configuration settings are supported:
different container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6),
CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/pytorch-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/pytorch-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.

For TensorFlow users, we provide an image that must be referenced in the experiment
configuration:

- ``determinedai/tensorflow-ngc-dev:0736b6d`` for NVIDIA GPUs and for CPUs.
- ``determinedai/tensorflow-ngc:0.38.0`` for NVIDIA GPUs and for CPUs.

- ``force_pull_image``: Forcibly pull the image from the Docker registry and bypass the Docker
cache. Defaults to ``false``.
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/deploy-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ by default in this version of Determined are described below.
- - Environment
- File Name
- - CPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - NVIDIA GPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - AMD GPUs
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512``

Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/gcp/install-gcp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,5 +406,5 @@ This command line will spin up a cluster of up to 2 A100s in the ``us-central1-c
--compute-agent-instance-type a2-highgpu-1g --gpu-num 1 \
--gpu-type nvidia-tesla-a100 \
--region us-central1 --zone us-central1-c \
--gpu-env-image determinedai/pytorch-ngc-dev:0736b6d \
--cpu-env-image determinedai/pytorch-ngc-dev:0736b6d
--gpu-env-image determinedai/pytorch-ngc:0.38.0 \
--cpu-env-image determinedai/pytorch-ngc:0.38.0
4 changes: 2 additions & 2 deletions docs/setup-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ by default in this version of Determined are described below.
- - Environment
- File Name
- - CPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - NVIDIA GPUs
- ``determinedai/pytorch-ngc-dev:0736b6d``
- ``determinedai/pytorch-ngc:0.38.0``
- - AMD GPUs
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512``

Expand Down
2 changes: 1 addition & 1 deletion docs/setup-cluster/slurm/slurm-requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,7 @@ platform. There may be additional per-user configuration that is required.

.. code:: bash

image=determinedai/pytorch-ngc-dev:0736b6d
image=determinedai/pytorch-ngc:0.38.0
cd /shared/enroot/images
enroot import docker://$image
enroot create /shared/enroot/images/${image//[\/:]/\+}.sqsh
Expand Down
12 changes: 6 additions & 6 deletions e2e_tests/tests/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
MAX_TRIAL_BUILD_SECS = 90


DEFAULT_TF2_CPU_IMAGE = "determinedai/tensorflow-ngc-dev:0736b6d"
DEFAULT_TF2_GPU_IMAGE = "determinedai/tensorflow-ngc-dev:0736b6d"
DEFAULT_PT_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu-dev:0736b6d"
DEFAULT_PT_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda-dev:0736b6d"
DEFAULT_PT2_CPU_IMAGE = "determinedai/pytorch-ngc-dev:0736b6d"
DEFAULT_PT2_GPU_IMAGE = "determinedai/pytorch-ngc-dev:0736b6d"
DEFAULT_TF2_CPU_IMAGE = "determinedai/tensorflow-ngc:0.38.0"
DEFAULT_TF2_GPU_IMAGE = "determinedai/tensorflow-ngc:0.38.0"
DEFAULT_PT_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu:0.38.0"
DEFAULT_PT_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda:0.38.0"
DEFAULT_PT2_CPU_IMAGE = "determinedai/pytorch-ngc:0.38.0"
DEFAULT_PT2_GPU_IMAGE = "determinedai/pytorch-ngc:0.38.0"

TF2_CPU_IMAGE = os.environ.get("TF2_CPU_IMAGE") or DEFAULT_TF2_CPU_IMAGE
TF2_GPU_IMAGE = os.environ.get("TF2_GPU_IMAGE") or DEFAULT_TF2_GPU_IMAGE
Expand Down
2 changes: 1 addition & 1 deletion e2e_tests/tests/fixtures/ports-proxy/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ max_restarts: 0

# Hardcode the image because the new image has a bug. TODO fix this when the image bug is fixed.
environment:
image: determinedai/pytorch-tensorflow-cpu-dev:0736b6d
image: determinedai/pytorch-tensorflow-cpu:0.38.0
proxy_ports:
- proxy_port: 8000
proxy_tcp: false
Expand Down
4 changes: 2 additions & 2 deletions examples/computer_vision/iris_tf_keras/adaptive.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: iris_tf_keras_adaptive_search
environment:
image:
cpu: determinedai/tensorflow-ngc-dev:0736b6d
gpu: determinedai/tensorflow-ngc-dev:0736b6d
cpu: determinedai/tensorflow-ngc:0.38.0
gpu: determinedai/tensorflow-ngc:0.38.0
hyperparameters:
learning_rate:
type: log
Expand Down
4 changes: 2 additions & 2 deletions examples/computer_vision/iris_tf_keras/const.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: iris_tf_keras_const
environment:
image:
cpu: determinedai/tensorflow-ngc-dev:0736b6d
gpu: determinedai/tensorflow-ngc-dev:0736b6d
cpu: determinedai/tensorflow-ngc:0.38.0
gpu: determinedai/tensorflow-ngc:0.38.0
hyperparameters:
learning_rate: 1.0e-4
learning_rate_decay: 1.0e-6
Expand Down
4 changes: 2 additions & 2 deletions examples/computer_vision/iris_tf_keras/distributed.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: iris_tf_keras_distributed
environment:
image:
cpu: determinedai/tensorflow-ngc-dev:0736b6d
gpu: determinedai/tensorflow-ngc-dev:0736b6d
cpu: determinedai/tensorflow-ngc:0.38.0
gpu: determinedai/tensorflow-ngc:0.38.0
hyperparameters:
learning_rate: 1.0e-4
learning_rate_decay: 1.0e-6
Expand Down
2 changes: 1 addition & 1 deletion examples/deepspeed/dcgan/mnist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ environment:
environment_variables:
- NCCL_DEBUG=INFO
- NCCL_SOCKET_IFNAME=ens,eth,ib
image: determinedai/pytorch-ngc-dev:0736b6d
image: determinedai/pytorch-ngc:0.38.0
bind_mounts:
- host_path: /tmp
container_path: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment:
# You may need to modify this to match your network configuration.
- NCCL_SOCKET_IFNAME=ens,eth,ib
image:
gpu: determinedai/pytorch-ngc-dev:0736b6d
gpu: determinedai/pytorch-ngc:0.38.0
resources:
slots_per_trial: 2
searcher:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment:
# You may need to modify this to match your network configuration.
- NCCL_SOCKET_IFNAME=ens,eth,ib
image:
gpu: determinedai/pytorch-ngc-dev:0736b6d
gpu: determinedai/pytorch-ngc:0.38.0
resources:
slots_per_trial: 2
searcher:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@
},
"force_pull_image": false,
"image": {
"cpu": "determinedai/tensorflow-ngc-dev:0736b6d",
"cuda": "determinedai/tensorflow-ngc-dev:0736b6d",
"cpu": "determinedai/tensorflow-ngc:0.38.0",
"cuda": "determinedai/tensorflow-ngc:0.38.0",
"rocm": "determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512"
},
"pod_spec": null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@
},
"force_pull_image": false,
"image": {
"cpu": "determinedai/tensorflow-ngc-dev:0736b6d",
"cuda": "determinedai/tensorflow-ngc-dev:0736b6d",
"cpu": "determinedai/tensorflow-ngc:0.38.0",
"cuda": "determinedai/tensorflow-ngc:0.38.0",
"rocm": "determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512"
},
"pod_spec": null,
Expand Down
4 changes: 2 additions & 2 deletions harness/tests/fixtures/checkpoint.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@
},
"force_pull_image":false,
"image":{
"cpu":"determinedai/pytorch-ngc-dev:0736b6d",
"cuda":"determinedai/pytorch-ngc-dev:0736b6d",
"cpu":"determinedai/pytorch-ngc:0.38.0",
"cuda":"determinedai/pytorch-ngc:0.38.0",
"rocm":"determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512"
},
"pod_spec":null,
Expand Down
4 changes: 2 additions & 2 deletions helm/charts/determined/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ defaultImages:
kubeScheduler: "k8s.gcr.io/scheduler-plugins/kube-scheduler:v0.18.9"

# default images for CPU and GPU environments
cpuImage: "determinedai/pytorch-ngc-dev:0736b6d"
gpuImage: "determinedai/pytorch-ngc-dev:0736b6d"
cpuImage: "determinedai/pytorch-ngc:0.38.0"
gpuImage: "determinedai/pytorch-ngc:0.38.0"
rocmImage: "determinedai/environments:rocm-5.6-pytorch-1.3-tf-2.10-rocm-mpich-0736b6d"

# Install Determined enterprise edition.
Expand Down
4 changes: 2 additions & 2 deletions master/pkg/schemas/expconf/const.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ const (

// Default task environment docker image names.
const (
CPUImage = "determinedai/pytorch-ngc-dev:0736b6d"
CUDAImage = "determinedai/pytorch-ngc-dev:0736b6d"
CPUImage = "determinedai/pytorch-ngc:0.38.0"
CUDAImage = "determinedai/pytorch-ngc:0.38.0"
ROCMImage = "determinedai/environments:rocm-5.6-pytorch-1.3-tf-2.10-rocm-mpich-0736b6d"
)

Expand Down
4 changes: 2 additions & 2 deletions schemas/test_cases/v0/experiment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@
environment_variables: {}
force_pull_image: false
image:
cpu: determinedai/pytorch-ngc-dev:0736b6d
cuda: determinedai/pytorch-ngc-dev:0736b6d
cpu: determinedai/pytorch-ngc:0.38.0
cuda: determinedai/pytorch-ngc:0.38.0
rocm: determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512
pod_spec: null
ports:
Expand Down
Loading
Loading