Skip to content

v0.2.3

Compare
Choose a tag to compare
@terrytangyuan terrytangyuan released this 19 May 14:16
· 188 commits to master since this release
aa96794

Enhancements

  • Added support for RH OCP4.1 and RH OCP4.2
  • Added additional installation methods
  • Added support for Go Modules and removed vendor directories
  • Added default ephemeral storage for init container
  • Overwrite NVIDIA env vars to avoid using GPUs on launcher
  • Added health check and callbacks around various leader election phases
  • Honor user-specified worker command
  • Exposed main container name as a configurable field
  • Added RunPolicy to MPIJobSpec that reuses kubeflow/common spec
  • Allow to specify the name of the gang scheduler and priority for pod group
  • Added error log when pod spec does not have any containers
  • Switched to use distroless images
  • Refactored the kubectl-delivery to improve the launcher performance
  • Added Prometheus metrics for job monitoring
  • Added experimental version of v1 MPIJob controller and APIs
  • Support Volcano as a scheduler
  • Switched to use pods for launcher job and statefulset workers
  • Switched to use klog for logging
  • More consistent labels with other Kubeflow operators

Fixes

  • Fixed nil pointer exceptions that could accidentally restart the pod
  • Updated status to running only when launcher is active and all workers are ready
  • Fixed the incorrect namespace for initializing informers and endpoints of leader election
  • Fixed issue in v1 controller's CRD existence check

Documentation