Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service #2414

Merged
merged 1 commit into from
Mar 5, 2021

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Feb 17, 2021

- What I did

We've seen occasional issues during upgrades caused by what seems to be
the kubelet still being active during OSTree finalization. If the
kubelet changes things in /etc, it'll confuse OSTree which is trying
to do the /etc merge.

We want to be sure that the kubelet is done modifying everything it
needs to in /etc and that it exited before we finalize the deployment.

Add a After=ostree-finalize-staged.service for this. The way this
works is that that service runs in its ExecStop, and shutdown ordering
is the reverse of startup. So this will cause the kubelet to exit before
ostree-finalize-staged.service is stopped.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1902963

- How to verify it

It's tricky to verify because it's a flake. So verification would be getting this in and not seeing that bug pop up anymore. (Could still at least verify though from the journal logs that the kubelet exited before ostree-finalize-staged.service.)

- Description for the changelog

During upgrades, we now wait for the kubelet to exit before finalizing the staged deployment to ensure no conflicts occur.

We've seen occasional issues during upgrades caused by what seems to be
the kubelet still being active during OSTree finalization. If the
kubelet changes things in `/etc`, it'll confuse OSTree which is trying
to do the `/etc` merge.

We want to be sure that the kubelet is done modifying everything it
needs to in `/etc` and that it exited before we finalize the deployment.

Add a `After=ostree-finalize-staged.service` for this. The way this
works is that that service runs in its `ExecStop`, and shutdown ordering
is the reverse of startup. So this will cause the kubelet to exit before
`ostree-finalize-staged.service` is stopped.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1902963
@jlebon
Copy link
Member Author

jlebon commented Feb 17, 2021

(Not tested.)

@cgwalters
Copy link
Member

This is probably related to the "shutdown timeout" issue the kubelet folks are hitting sometimes where systemd times out waiting for containers and goes on a SIGTERM spree.

Hmm you know, I bet we could move the /etc merge into the initramfs on next boot or so. That would close off all races.

@kikisdeliveryservice kikisdeliveryservice changed the title templates: add After=ostree-finalize-staged.service to kubelet.service Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service Feb 17, 2021
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Feb 17, 2021
@openshift-ci-robot
Copy link
Contributor

@jlebon: This pull request references Bugzilla bug 1902963, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 17, 2021
@cgwalters
Copy link
Member

Worth noting of course that #1190 would also fix this.

But the more I think about this the more this feels like a pure ostree-level bug - if systemd is in the "SIGTERM all the things" phase, we shouldn't try to do the ostree finalization at all.

@darkmuggle
Copy link
Contributor

I presume we'll want to backport this to 4.6?

@jlebon
Copy link
Member Author

jlebon commented Feb 17, 2021

But the more I think about this the more this feels like a pure ostree-level bug - if systemd is in the "SIGTERM all the things" phase, we shouldn't try to do the ostree finalization at all.

Hmm, but even so the root issue is still that there's no ordering between those services, right? So even in the graceful case, we need some strong ordering.

Hmm you know, I bet we could move the /etc merge into the initramfs on next boot or so. That would close off all races.

Doesn't that imply writing to the bootloader at shutdown still? Also not sure about adding more fallible code into the post-reboot path. It's nice that right now if it fails, we just go back into the same deployment.

Hmm, another easier approach is to have ostree-finalize-staged.service remount /etc read-only before starting the /etc merge. (That doesn't really fix any race, but at least it makes the operation more reliable.)

@jlebon
Copy link
Member Author

jlebon commented Feb 17, 2021

I presume we'll want to backport this to 4.6?

So in one RHBZ, this was hit in 4.5 going to 4.6, in another one, it was hit going from 4.6 to 4.7. So ideally, we'd backport all the way to 4.5? (It doesn't seem like a super common issue, though at least it doesn't require new bootimages so!)

Mostly first looking to confirm this does work though. Will we be able to see the journal logs from the e2e-agnostic-upgrade test once it completes?

@@ -5,6 +5,7 @@ contents: |
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service
After=ostree-finalize-staged.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth putting a condition in the template so that this is only used for RHCOS nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, this will work on OSTree based nodes but break RHEL nodes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea!

Though hmm... AFAICT there doesn't seem to be a variable already exposed for this. So I think this would require wiring it through the render code first. Meh... maybe not worth the trouble in that case? (It's totally harmless on non-RHCOS nodes.)

Copy link
Member Author

@jlebon jlebon Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, this will work on OSTree based nodes but break RHEL nodes

systemd just ignores Before=/After= constraints if the target unit is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, RHEL nodes are safe then 😆

@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2021

/retest

@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2021

OK cool, looks like this works! Looking at the journal logs from one of the nodes in ci/prow/e2e-agnostic-upgrade, I see:

   8916 Feb 22 16:25:27.917657 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 root[81740]: machine-config-daemon[70704]: Rebooting node
   8917 Feb 22 16:25:27.921197 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 root[81741]: machine-config-daemon[70704]: initiating reboot: Node will reboot into config rendered-worker-251b5c4768cfdff11d117502b97d1508
   8918 Feb 22 16:25:27.934014 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-worker-251b5c4768cfdff11d117502b97d1508.
   8919 Feb 22 16:25:27.948589 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopping Kubernetes Kubelet...
   8920 Feb 22 16:25:27.948842 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 hyperkube[1537]: I0222 16:25:27.948627    1537 dynamic_cafile_content.go:182] Shutting down client-ca-bundle::/etc/kubernetes/kubelet-ca.crt
   8921 Feb 22 16:25:27.970705 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: kubelet.service: Succeeded.
   8922 Feb 22 16:25:27.971695 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopped Kubernetes Kubelet.
   8923 Feb 22 16:25:27.971891 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: kubelet.service: Consumed 8min 7.090s CPU time
   8924 Feb 22 16:25:27.986455 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd-logind[1199]: System is rebooting.
   8925 Feb 22 16:25:27.994245 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: machine-config-daemon-reboot.service: Succeeded.
   8926 Feb 22 16:25:27.994948 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-worker-251b5c4768cfdff11d117502b97d1508.
   8927 Feb 22 16:25:27.995596 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: machine-config-daemon-reboot.service: Consumed 19ms CPU time
...
   9073 Feb 22 16:25:28.681767 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopping OSTree Finalize Staged Deployment...
   9074 Feb 22 16:25:28.688061 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopped target Paths.
   9075 Feb 22 16:25:28.694123 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: console-login-helper-messages-issuegen.path: Succeeded.
   9076 Feb 22 16:25:28.696019 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopped Monitor console-login-helper-messages runtime issue snippets directory for changes.
   9077 Feb 22 16:25:28.791278 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 ostree[81828]: Finalizing staged deployment
   9078 Feb 22 16:25:31.146325 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 ostree[81828]: Copying /etc changes: 14 modified, 0 removed, 168 added
   9079 Feb 22 16:25:31.146846 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 ostree[81828]: Copying /etc changes: 14 modified, 0 removed, 168 added
   9080 Feb 22 16:25:31.222965 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 kernel: EXT4-fs (sda3): re-mounted. Opts: (null)
   9081 Feb 22 16:25:32.033315 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 ostree[81828]: Bootloader updated; bootconfig swap: yes; deployment count change: 0
   9082 Feb 22 16:25:32.033832 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 ostree[81828]: Bootloader updated; bootconfig swap: yes; deployment count change: 0
   9083 Feb 22 16:25:33.187729 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: ostree-finalize-staged.service: Succeeded.
   9084 Feb 22 16:25:33.188550 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: Stopped OSTree Finalize Staged Deployment.
   9085 Feb 22 16:25:33.196203 ci-op-i1ivpc34-57c36-7nfmp-worker-b-k7qs2 systemd[1]: ostree-finalize-staged.service: Consumed 2.464s CPU time

@jlebon
Copy link
Member Author

jlebon commented Feb 22, 2021

/retest

@kikisdeliveryservice
Copy link
Contributor

@jlebon both of the remaining (but unrequired) tests are super flaky/red so I wouldn't' expect them to pass on any pr at this point ;)

/assign @sinnykumari @darkmuggle

@jlebon
Copy link
Member Author

jlebon commented Mar 2, 2021

/retest

@jlebon
Copy link
Member Author

jlebon commented Mar 3, 2021

Anyone want to /lgtm this?

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@sinnykumari
Copy link
Contributor

ah, missed to get to this PR earlier.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlebon, kikisdeliveryservice, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sinnykumari
Copy link
Contributor

@jlebon shall we cherrypick this or we want to wait for a while to see how it goes in master?

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@jlebon
Copy link
Member Author

jlebon commented Mar 5, 2021

@jlebon shall we cherrypick this or we want to wait for a while to see how it goes in master?

I think this is fairly safe to backport, but maybe to start with let's just backport it to 4.7?

@sinnykumari
Copy link
Contributor

/cherrypick release-4.7

@openshift-cherrypick-robot

@sinnykumari: once the present PR merges, I will cherry-pick it on top of release-4.7 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2021

@jlebon: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-workers-rhel7 0369c47 link /test e2e-aws-workers-rhel7

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 82868e6 into openshift:master Mar 5, 2021
@openshift-ci-robot
Copy link
Contributor

@jlebon: All pull requests linked via external trackers have merged:

Bugzilla bug 1902963 has been moved to the MODIFIED state.

In response to this:

Bug 1902963: templates: add After=ostree-finalize-staged.service to kubelet.service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@sinnykumari: new pull request created: #2455

In response to this:

/cherrypick release-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants