-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1927041: daemon: safer signal handling for shutdown #2395
Bug 1927041: daemon: safer signal handling for shutdown #2395
Conversation
The MCD is using the standard 600 grace period to end its work (5min). However, we have seen cases where this is insufficient and the node is rebooted under the MCD. The MCD has sigterm handling, but if the grace period times out, then Kubernetes sends a SIGKILL.
I dropped the Systemd Inhibit functionality to stop reboots. During the team discussion today, the prevailing view is that the MCO cannot block a reboot. |
@@ -634,16 +637,16 @@ func (dn *Daemon) Run(stopCh <-chan struct{}, exitCh <-chan error) error { | |||
|
|||
go wait.Until(dn.worker, time.Second, stopCh) | |||
|
|||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for
loop is superfluous since once we get the sigterm
we should finish the work and then get out of the way. A sigkill
could/likely follow and there's literally nothing we can do about that.
This really looks good and handles well interruption during update process. Colin, can you please also take a look to make sure we are not missing anything here? |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you say the core thing this is fixing here is that currently we were ignoring SIGTERM
if caught during an update, but we didn't then exit after the update had finished? And that was causing systemd to time out and go on the SIGKILL
spree?
Overall I think this looks improved; one optional comment.
I find it hard to review all of this for correctness though - I'd reiterate that I think we really want to make this whole thing transactional w/ostree support. I hope at some point in the next few months to land some infrastructure for that.
pkg/daemon/daemon.go
Outdated
return nil | ||
case sig := <-signaled: | ||
glog.Warningf("shutdown of machine-config-daemon triggered via signal %d", sig) | ||
return fmt.Errorf("shutdown of the machine-config-daemon trigger via syscall signal %d", sig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting SIGTERM
isn't an error, it's a normal condition. Which relates to one of the original goals I had here in that in the "idle" aka "not applying updates" case, we don't install a SIGTERM
handler at all - we just let the kernel unilaterally kill the process.
I more recently posted about this here: https://internals.rust-lang.org/t/should-rust-programs-unwind-on-sigint/13800/11
See also e.g. nhorman/rng-tools#72
And related to all this, ideally of course we do #1190 - then we shouldn't need to handle SIGTERM at all in the MCO. With that if we're killed (or the machine power cycled/kernel froze) in the middle of an update, we either have the old or new system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did go back on and forth on whether or to report SIGTERM
as an error or not for the reasons your blog highlights and ultimately went with returning the error to signal the reason for the death of the process (and thus make it obvious in the logs). In retrospect, I think just logging and moving on is the better path.
FWIW I was arguing for making the MCD block reboots, however, @crawford argued that fully transactional updates would negate the need; he would rather wait for transactional support from rpm-ostree
before doing any reboot armoring. My view is that until we get the transactional update mechanism, a short-lived inhibitor is better than nothing and could prevent some support cases and bug reports.
And related to all this, ideally of course we do #1190 - then we shouldn't need to handle SIGTERM at all in the MCO.
I would disagree -- we should at, the very least, through away the pending transaction (or have rpm-ostree
do it).
Tangentially related to this...one thing that could help in the MCO design is if we had a "dual nature" as both a pod and a systemd unit. For example, we could represent the "applying updates" phase via e.g. This would also more naturally handle cases where e.g. we're updating a node and while that happens a cluster upgrade is happening an a new daemonset is rolling out. Now with correct (This "proxying a systemd unit" is actually what's happening with rpm-ostreed today) |
I had a similar thought, although your idea is a bit more refined. My idea was to have something that would ensure that if a user did a shutdown/reboot in the middle that RHCOS would wait until the update was done. However, the idea was NAK'd during our recent team discussion. Regardless, I do think that RHCOS and MCO could work a little better on coordinating when an update is safe to apply or block an update from starting if the machine is shutting down (such as in the case when a user has SSH'd). |
@cgwalters this was fix four issues that were observed were:
So the core problem this seeks to solve is:
|
Thanks Colin for reviewing. @ben Latest push broke some tests which needs to get fixed first. |
The armors the signal handling for the daemon blocking any shutdown until _after_ an update is complete. The old functions `catchIgnoreSIGTERM` and `cancelSIGTERM` really didn't do much (they used the mutex and then set a bool) but there was no checks in the signal handling. Signed-off-by: Ben Howard <[email protected]>
Arg, I forgot to do the |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, darkmuggle, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@darkmuggle: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@darkmuggle: All pull requests linked via external trackers have merged: Bugzilla bug 1927041 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This armors the signal handling for the daemon blocking any shutdown until after an update is complete.
The old functions
catchIgnoreSIGTERM
andcancelSIGTERM
really didn't do much (they used the mutex and then set a bool) but there were no checks in the signal handling.This also increases the time for the MCD to shutdown from 5min to a 1hr. Since the MCD will shut down immediately when safe to do so this shouldn't have negative effects except in the case of an already unhappy node.