-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use kubernetes labels to exclude instances from the upgrade() cycle #333
Comments
I even think this could work well using |
@preflightsiren are you referring to usage with native upgrade strategy? i.e. not using upgrade-manager/crd strategy? |
Thanks @eytan-avisror we're actually using the custom resource for upgrades (workflows). The flow looks like this
If the workflow is allowed to exit at step 5. Instance-manager will check that all nodes are running the latest version and try to resolve it. This issue is to try and find a mechanism to allow instance-manager to mark an upgrade as complete. Custom labels were my first thought; this issue tries to reuse the existing patterns. |
You should be able to make this work with the way things are today in
instance-manager and upgrade-manager.
Upgrade manager uses the eviction API to drain pods from a node. If those
pods aren’t managed by a replication controller (deployment, replicaset
etc) or if they have a relevant PDB, they should block the node from
getting drained until the pods terminate or the PDB criteria are met.
…On Sat, Oct 9, 2021 at 8:01 PM Sebastian Cole ***@***.***> wrote:
Thanks @eytan-avisror <https://github.com/eytan-avisror> we're actually
using the custom resource for upgrades (workflows). The flow looks like this
1. patch the InstanceGroup; usually the image id.
2. the launch template is updated (by instance-manager)
3. Workflow is created
4. Workflow taints/cordons nodes
5. Workflow enters sleep/wait until all nodes are reaped
6. An external process restarts the workloads running on the old nodes
7. Now the old nodes are empty, cluster autoscaler will scale down the
old nodes
8. Fin.
If the workflow is allowed to exit at step 5. Instance-manager will check
that all nodes are running the latest version and try to resolve it. This
issue is to try and find a mechanism to allow instance-manager to mark an
upgrade as complete.
Custom labels were my first thought; this issue tries to reuse the
existing patterns.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#333 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASUBUZTQ4OI4IR6TX6IJV3UGD62DANCNFSM5FKTE2UQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@backjo not sure I'm following your point. The pods are backed by deployments and statefulsets, but they can only be restarted at particular times (customer maint windows). I'm not sure how that fits with the checks that all nodes are running the latest launch template, could you expand your thoughts? |
Ah sorry @preflightsiren - didn't see this reply. Basically, where I was going is that Pods that can't handle disruption can define PodDisruptionBudgets - which are respected by the upgrade controller. So if the disruption budget prevents a given pod from being evicted, the node will not end up getting terminated until that disruption budget or the pod naturally exits. |
Is this a BUG REPORT or FEATURE REQUEST?: Feature
We have a workflow that allows workloads to run on a node much longer than the upgrade window of the node. There's an external process that will reap these workloads and nodes. Currently when upgrading these
instancegroups
, instance-manager will detect that there are still nodes that need to upgraded and rerun the upgrade process.I would like to be able to label nodes
instancemgr.keikoproj.io/exclude-upgrades: true
or similar to skip evaluating nodes.The text was updated successfully, but these errors were encountered: