-
Notifications
You must be signed in to change notification settings - Fork 73
Proposal:add priority and queue in scheduling for the common operator #46
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
/cc @gaocegege |
/cc @richardsliu @johnugeorge @hougangliu Thanks for the proposal! |
Are we going to inline |
maybe all operrator should add SchedulingPolicy,so we can add SchedulingPolicy in this comnon package |
Yeah, we should add SchedulingPolicy to common. But pytorch-operator and tf-operator does not use common now. We should re-implement the logic in these operators, too |
Yes, we should implement the logic in MXNet-Operator too. |
Do you have any suggestion? |
retire wg-machine-learning?It is so bad. |
Nop for now, I'll help to maintain ML-WG for a while; if still not working items, we'll retire it :) |
Hi is there any update? |
hm... are we going to do this feature? |
Yes it's part of our roadmap so contribution is welcomed. |
Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
I think we can close this refer to common/pkg/apis/common/v1/types.go Lines 204 to 209 in 21f5ba8
|
Problem
1.Currently in kube-batch,it has PodGroupSpec that it includes some status about scheduling policy,for example MinAvailable,Queue,PriorityClassName.But kubeflow operators don't provide the parameters for kube-batch now.
2.MPI-operator and tf-operator don't use common operator,and pytorch-operator and mxnet-operator use tf-operator/pkg/common package.
Proposed Solution
1.Supplement these attributions in type RunPolicy.SchedulingPolicy. When it uses kubeflow and kube-batch,
kubeflow can pass parameters to kube-batch.
2.All operators use common operator.Because tf,pytorch and mxnet are similar.The bad news is that mpi maybe need more changes.
Advantages
Unify all operators about runPolicy and packages where are imported.
Frameworks Support
pytorch
mxnet
mpi
tensorflow
Rough API Spec(pytorch-operator)
The text was updated successfully, but these errors were encountered: