You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PyTorch operator resource can already be used in the current Kubeflow deployment, so there's no need to pause development while we update the deployment to the training operator.
Use Case
We want to utilize multi node GPU scaling for PyTorch for a benchmark / potential larger scale model training
Ideas of Implementation
Implement KF Training Operator which as an added benefit should unlock all the relevant frameworks as well.
https://github.com/kubeflow/training-operator
Message from the maintainers:
Excited about this feature? Give it a 👍. We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: