-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2104619: Remove rollback deployment #3243
Conversation
@jmarrero: This pull request references Bugzilla bug 1907333, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Let's retitle the commit to be: One thing to debate here too is whether we should always remove the rollback, or whether we should only do so before upgrades. If we always remove the rollback, it kind of obviates the grub password bits, and means it will never be available if administrators need it for some reason. (This is the same debate as we need to have in ostreedev/ostree#2670 - do we always cleanup or do we only do so when we need to because of disk spaces issues) |
pkg/daemon/update.go
Outdated
@@ -720,6 +720,20 @@ func (dn *Daemon) updateHypershift(oldConfig, newConfig *mcfgv1.MachineConfig, d | |||
return nil | |||
} | |||
|
|||
// removeRollback removes the rpm-ostree rollback deployment. It | |||
// takes up space, and we don't generally expect administrators to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's tweak this comment now to link to https://bugzilla.redhat.com/show_bug.cgi?id=2104619
(IIRC when I was talking about "space" there I was mainly thinking of /
not /boot
, though both are valid)
pkg/daemon/update.go
Outdated
_, err := runGetOut("rpm-ostree", "cleanup", "-r") | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can now be return runRpmOstree("cleanup", "-r")
@jmarrero: This pull request references Bugzilla bug 2104619, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@Prashanth684: This pull request references Bugzilla bug 2104619, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4d6a940
to
9a22d0b
Compare
Does the MCO has a way to check for a Config Map value for example? It could be something where we always delete unless "KEEP_ROLLBACK" is set. That way is more intentional in case an Admin is trying to debug something. If we just do it for upgrades, (If you mean OCP upgrades) then if someone install kernel-rt we might get caught again by the lack of space. |
Should we restrict this workaround to ppc64le only or maybe check for X% amount of free space before cleaning-up? |
9a22d0b
to
987e5e9
Compare
I don't have strong opinions on this, but I think we might start seeing the same issues in other architectures eventually. If we need to check for a specific condition I would prefer space. |
987e5e9
to
ef23c0f
Compare
We could also merge this this as is, and change it once: ostreedev/ostree#2670 lands and change the MCO/RHCOS to just add the new config in a future version? |
I will lean towards keeping it consistent across all arches |
pkg/daemon/update.go
Outdated
// do not attempt to rollback on non-RHCOS/FCOS machines | ||
return nil | ||
} | ||
return runRpmOstree("cleanup", "r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably already noticed, but if not -- looks like the dash on the r is missing ( -r
) 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch John :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, thanks for catching that.
Unfortunately this is slightly weakening our "resiliency on upgrade failure" stance thus the suggestion for using this only when needed. But I also agree that consistency helps with debugging. |
My thoughts were also in line with what Colin's point. Will let other reviewers add lgtm @jmarrero Thanks for working on the fix. As a follow-up can you also create a jira card in MCO board to track removal of this workaround when ostreedev/ostree#2670 is fixed. |
/retest-required |
/retest |
gcp-op test failure:
Since we are doing rollback, index of boot changes from 1 to 0 and hence mismatch is happening. |
Both e2e aws and agnostic-upgrade are failing because of known infra issue |
The Config Drift Monitor has no interaction with kernel args. This particular case is better covered by the TestKernelArguments test.
Since aws test will most likely fail, let's test upgrade on gcp |
/test e2e-azure-upgrade |
/test e2e-gcp-op |
Looks like the image import affects gcp and azure too? |
Looks like imagestream issue is fixed now, some of our recent test are green. Let's give another try |
@jmarrero: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Tests are looking good, let's merge it |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jmarrero, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jmarrero: All pull requests linked via external trackers have merged: Bugzilla bug 2104619 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.11 |
This reverts commit 25a7812. to provide a fix to: https://bugzilla.redhat.com/show_bug.cgi?id=2104619
@sinnykumari: new pull request created: #3249 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a hack to work around the lack of coreos/rpm-ostree@0556152 for RHEL 8.6 and below. A recent PR in the MCO openshift/machine-config-operator#3243 tipped things over the edge and we now see failures a lot more often.
This is a hack to work around the lack of coreos/rpm-ostree@0556152 for RHEL 8.6 and below. A recent PR in the MCO openshift/machine-config-operator#3243 tipped things over the edge and we now see failures a lot more often.
We've cherry-picked this one into 4.11, but don't we also need to cherry-pick it into 4.10? |
This is a great question! I have had to dig into this at least twice before and I forget the answer, but clearly we need to add the answer to the docs. OK right, the answer is in the ordering listed here:
Notice that the MCD is rolled out before we block on the pools. So I don't think we need to backport this to 4.10 - but that said, doing so seems like a good idea. |
This is a hack to work around the lack of coreos/rpm-ostree@0556152 for RHEL 8.6 and below. A recent PR in the MCO openshift/machine-config-operator#3243 tipped things over the edge and we now see failures a lot more often.
This reverts commit 25a7812.
to provide a fix to:
https://bugzilla.redhat.com/show_bug.cgi?id=2104619