You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened: When job controller submits a worker pod without any tolerations it will be skipped by NoExecute as we are using nodeName to schedule the pod on the specific node, but NoExecute will evict the pod as soon as it is scheduled.
What you expected to happen:
forensics-controller should submit bare pod and check the status if it needs to re-submit another pod I case of any failures.
provide an option in PodCheckpoint spec for a user to submit a list of tolerations
How to reproduce it (as minimally and precisely as possible):
Add taints to all nodes
Run a pod with tolerations on the tainted node
Try to get PodCheckpoint of pod.
Other debugging information (if applicable):
Example:
I have an IG name appikstesting which has following taints on all nodes
I think using nodeName is problematic here because it bypasses the scheduler by declaring which node you want to schedule on.
Where as if you went through scheduler the pod would have been pending since it is missing the required toleration.
Maybe the fix should be to:
Affinitize to a node instead of setting it in nodeName (or use nodeSelector).
Get the node spec prior to scheduling the job, find any taints, and add matching tolerations to your job.
Is this a BUG REPORT or FEATURE REQUEST?: BUG
What happened: When job controller submits a worker pod without any tolerations it will be skipped by
NoExecute
as we are usingnodeName
to schedule the pod on the specific node, butNoExecute
will evict the pod as soon as it is scheduled.What you expected to happen:
PodCheckpoint
spec for a user to submit a list of tolerationsHow to reproduce it (as minimally and precisely as possible):
PodCheckpoint
of pod.Other debugging information (if applicable):
Example:
appikstesting
which has following taints on all nodesResulting in following
Above fight between
job controller
andkube-controller-manager
was able to manage to scheduled and evict about 23000 pod's in 30mins time.I have removed the
NoExecute
taint on the node in question, Then pods got scheduled as expected and everything worked as expected.The text was updated successfully, but these errors were encountered: