-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service #2414
Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service #2414
Conversation
We've seen occasional issues during upgrades caused by what seems to be the kubelet still being active during OSTree finalization. If the kubelet changes things in `/etc`, it'll confuse OSTree which is trying to do the `/etc` merge. We want to be sure that the kubelet is done modifying everything it needs to in `/etc` and that it exited before we finalize the deployment. Add a `After=ostree-finalize-staged.service` for this. The way this works is that that service runs in its `ExecStop`, and shutdown ordering is the reverse of startup. So this will cause the kubelet to exit before `ostree-finalize-staged.service` is stopped. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1902963
(Not tested.) |
This is probably related to the "shutdown timeout" issue the kubelet folks are hitting sometimes where systemd times out waiting for containers and goes on a SIGTERM spree. Hmm you know, I bet we could move the |
@jlebon: This pull request references Bugzilla bug 1902963, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Worth noting of course that #1190 would also fix this. But the more I think about this the more this feels like a pure ostree-level bug - if systemd is in the "SIGTERM all the things" phase, we shouldn't try to do the ostree finalization at all. |
I presume we'll want to backport this to 4.6? |
Hmm, but even so the root issue is still that there's no ordering between those services, right? So even in the graceful case, we need some strong ordering.
Doesn't that imply writing to the bootloader at shutdown still? Also not sure about adding more fallible code into the post-reboot path. It's nice that right now if it fails, we just go back into the same deployment. Hmm, another easier approach is to have |
So in one RHBZ, this was hit in 4.5 going to 4.6, in another one, it was hit going from 4.6 to 4.7. So ideally, we'd backport all the way to 4.5? (It doesn't seem like a super common issue, though at least it doesn't require new bootimages so!) Mostly first looking to confirm this does work though. Will we be able to see the journal logs from the |
@@ -5,6 +5,7 @@ contents: | | |||
Description=Kubernetes Kubelet | |||
Wants=rpc-statd.service network-online.target crio.service | |||
After=network-online.target crio.service | |||
After=ostree-finalize-staged.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth putting a condition in the template so that this is only used for RHCOS nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, this will work on OSTree based nodes but break RHEL nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea!
Though hmm... AFAICT there doesn't seem to be a variable already exposed for this. So I think this would require wiring it through the render code first. Meh... maybe not worth the trouble in that case? (It's totally harmless on non-RHCOS nodes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, this will work on OSTree based nodes but break RHEL nodes
systemd just ignores Before=
/After=
constraints if the target unit is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, RHEL nodes are safe then 😆
/retest |
OK cool, looks like this works! Looking at the journal logs from one of the nodes in ci/prow/e2e-agnostic-upgrade, I see:
|
/retest |
@jlebon both of the remaining (but unrequired) tests are super flaky/red so I wouldn't' expect them to pass on any pr at this point ;) /assign @sinnykumari @darkmuggle |
/retest |
Anyone want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sinnykumari PTAL
ah, missed to get to this PR earlier. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlebon, kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jlebon shall we cherrypick this or we want to wait for a while to see how it goes in master? |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
I think this is fairly safe to backport, but maybe to start with let's just backport it to 4.7? |
/cherrypick release-4.7 |
@sinnykumari: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jlebon: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@jlebon: All pull requests linked via external trackers have merged: Bugzilla bug 1902963 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sinnykumari: new pull request created: #2455 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- What I did
We've seen occasional issues during upgrades caused by what seems to be
the kubelet still being active during OSTree finalization. If the
kubelet changes things in
/etc
, it'll confuse OSTree which is tryingto do the
/etc
merge.We want to be sure that the kubelet is done modifying everything it
needs to in
/etc
and that it exited before we finalize the deployment.Add a
After=ostree-finalize-staged.service
for this. The way thisworks is that that service runs in its
ExecStop
, and shutdown orderingis the reverse of startup. So this will cause the kubelet to exit before
ostree-finalize-staged.service
is stopped.Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1902963
- How to verify it
It's tricky to verify because it's a flake. So verification would be getting this in and not seeing that bug pop up anymore. (Could still at least verify though from the journal logs that the kubelet exited before
ostree-finalize-staged.service
.)- Description for the changelog
During upgrades, we now wait for the kubelet to exit before finalizing the staged deployment to ensure no conflicts occur.