-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1885365: daemon: properly handle unit enable/disables #2145
Bug 1885365: daemon: properly handle unit enable/disables #2145
Conversation
@yuqi-zhang: This pull request references Bugzilla bug 1885365, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
need to fix up a bit more but general idea is up for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally.
8ca3f00
to
e27e04e
Compare
Tested with the following:
and
Adding the first correctly adds the service Also tested without this PR, the above would link to multi-user.target.wants instead (incorrect) and does not get disabled when you remove the second MC (incorrect) |
/skip |
@yuqi-zhang: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-gcp-op |
So there's a small issue with this approach: there appears to be roughly a ~35% increase in operation time due to the extra writes and calls to systemctl. At the very least this times out our GCP-OP tests. Our options then seem to be:
|
Today the MCO rewrites files even if they didn't change; if we fixed that it'd be a general improvement and also fix that bug I think. |
e27e04e
to
b22cbf3
Compare
I was under the impression that this was somewhat intentional. Although inefficient, this has some benefits e.g. allowing us to use the We could consider that separately. I also don't see how it would fix this issue, as the fundamental problem is a wrong hardcode + no restoration of system defaults |
Hand-implementing
|
6b526da
to
78b0fd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/retest |
Remove the hardlink to multi-user.target and instead invoke systemctl to directly enable/disable units as needed, so [Install] - wantedby sections are properly parsed. Also call systemctl preset on units that are deleted/changed, to ensure "rollback" of unit enablement to system defaults. Remove the Ignition-written preset file to account for this. Signed-off-by: Yu Qi Zhang <[email protected]>
With the introduction of shelling out to systemctl, the MCO e2e tests incur a ~35% penalty to speed. Try to mitigate this by collecting all enabled/disabled units and writing them in one call. Don't do this for presets, since we don't want to fail for any individual preset command failing. Signed-off-by: Yu Qi Zhang <[email protected]>
e16110a
to
b9e06c3
Compare
Rebased, will see whether slowdowns are still seen in gcp-op tests |
/retest |
Comparing runtimes of most recent gcp-op tests from other PRs, I don't see a significant difference in runtime for the tests that passed. Will retest a few more times after gcp-op issues are fixed. |
Tested again with the steps in #2145 (comment). Once we're confident in GCP-OP not being slowed down by this, I'm ready to have this merge |
/retest |
4 similar comments
/retest |
/retest |
/retest |
/retest |
/approve Reading over the PR and the code, this LGTM. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bgilbert, darkmuggle, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
One more test to be safe. I seem some minor slowdowns but I think they are unrelated to this PR. Overall seems to be matching expectations /test ci/prow/e2e-gcp-op |
@yuqi-zhang: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-gcp-op |
Times look fine |
/retest Please review the full test history for this PR and help us cut down flakes. |
@yuqi-zhang: All pull requests linked via external trackers have merged: Bugzilla bug 1885365 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If unit enable fails, remove broken symlinks in multi-user.target.wants and try again. This fixes a bug where enables would fail on cluster upgrades with RHEL 7 nodes between 4.6 -> 4.7. Context: before openshift#2145, the MCO hard coded a symlink from /etc/system/systemd/$UNIT to /etc/systemd/system/multi-user.target.wants/$UNIT, which is not the case for every unit and thus caused broken symlinks. On RHCOS/FCOS, the systemd version is newer and is able to remove broken symlinks, but on RHEL 7 nodes, it will not first attempt to remove broken symlinks and thus fails the enable. As a workaround, this PR thus attempts to remove broken symlinks when the first enable fails, and then try again. Successful FCOS/RHCOS upgrades should not hit this, and failing ones would report full errors. The error checking is perhaps a bit overkill but the original bug case should only run through this logic once before it is fixed. Future errors are likely actual errors and will be reported as such. Signed-off-by: Yu Qi Zhang <[email protected]>
Remove the hardlink to multi-user.target and instead invoke systemctl
to directly enable/disable units as needed, so [Install] - wantedby
sections are properly parsed.
Also call systemctl preset on units that are deleted/changed, to
ensure "rollback" of unit enablement to system defaults. Remove
the Ignition-written preset file to account for this.