Skip to content

21.03

Compare
Choose a tag to compare
@supertetelman supertetelman released this 11 Mar 22:33
· 1 commit to release-21.03 since this release

DeepOps 21.03 Release Notes

What's New

General

  • Rsyslog client/server for K8s & Slurm deployments
  • Examples for running Ansible and configuring Inventory file
  • Improved support for Ubuntu 20.04 and CentOS 8
  • Docker login convenience playbook
  • Marked air-gap as "experimental"
  • Vagrant/virtual 2.2.14 (previously 2.2.3)

Slurm

  • Slurm version 20.11.3 (previously 20.02.4)
  • HPC SDK 21.2 (previously 2020_207)

K8s

  • Helm version v3.4.1 (previously v3.1.2)
  • NFS Client Provisioner as K8s Default StorageClass
  • GPU Operator v1.5.2(previously v1.1.7)
  • GPU Device Plugin v0.8.2 (previously v0.7.0)
  • GPU Feature Discovery v0.4.1 (previously v0.2.0)
  • Example NGC Dockerfiles bumped to 20.12 with improved documentation
  • New example yaml files for launching single node/multi node training and jupyter notebooks
  • RoCE perfromance playbook

Changes

  • Deprecation of Rook-Ceph deployment script
  • Removed default MPI Operator install for K8s
  • NFS server is now deployed on kube-master[0] by default with path /export/deepops_nfs
  • New log bundling tool (debug.sh) for K8s
  • Enroot marked as "not fully automated" for CentOS (simple workaround is to bump enroot Ansible Galaxy role from v0.3.2 to v0.4.0 and re-run setup.sh)

Bugs/Enhancements

  • K8s monitoring metrics now persist by default using NFS-backed PVs.
  • Additional testing for Ubuntu 20.04, CentOS 8, GPU Operator, enroot, mpi, and testing.md
  • Addressed firewall issues in CentOS
  • Add vGPU support for GPU Operator installs
  • Address intermittent download failures in Slurm install

Upgrade steps

If you are upgrading to this version of DeepOps from a previous release you will need to follow the upgrade section of the Slurm or Kubernetes Deployment Guides. In addition to this, the ./scripts/setup.sh script must be re-run and any new variables in the config.example files should be added to the existing config. For a full diff from release 20.12 run git diff 20.12 21.03 -- config.example/. Note, the majority of the config changes are around new functionality such as nfs-client-provisioner, rsyslog, and persistent monitoring metrics in K8s. If you encounter problem please open a GitHub issue. See the update guide for additional guidance.

Notes