-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing boot images via the MCO #1496
Managing boot images via the MCO #1496
Conversation
1238a99
to
f3defdf
Compare
I don't have a deep knowledge of the MCO but I gave this a read and this looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this!
Based on Colin's suggestion, I fleshed out the rollback strategy, added some related questions plus other minor clean ups |
/cc |
|
||
### Open Questions | ||
|
||
- Should we have a like a global switch that opt-in all `MachineSets` for this mechanism? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's a way that this can be achieved without even modifying the MachineSets, what if it were done at admission time based on the labels on the Machine (and future CAPI InfrastructureMachine) being created. That would mean you don't have to worry about modifying resources in place and would allow the cluster admin to enforce policy at some level.
Need to have a think about this 🤔
Updated based on Joel and Jerry's feedback, clarified UPI/IPI, phases and the MCS TLS cert issue. Our main outstanding questions that we need consensus on remain the following:
I've also made a PR for the feature gate required for this work: openshift/api#1646 |
Fleshed out some sections; namely opt-in, degrade, reverting and "safety" checks. I also changed all mentions of rollback to revert. |
Me, @JoelSpeed , @yuqi-zhang and @sinnykumari had a meeting about this EP and I've updated it to reflect that discussion, namely:
|
Based on discussion in this comment thread regarding stub secret management, I've updated the enhancement. This changes a few things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider those who build automation on top of OCP who's controllers might have issues with being updated. Should we exclude updates from MachineSets if they have an owner reference for example? CC @2uasimojo for thoughts from Hive perspective.
Also looking to the future, we expect MachineDeployments to be introduced in CAPI, at that point, the MachineDeployment should be updated, rather than the MachineSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the API sections not being up to date, there are a few comments that you've resolved but have not yet pushed the fixes for. Please make sure to address the feedback you've been given and make sure the body of the EP is updated. Otherwise I think I have nothing further to add on this right now!
title: manage-boot-images | ||
authors: | ||
- "@djoshy" | ||
reviewers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo I think someone from Hive should be aware of and review this EP
76fea9c
to
74966a5
Compare
I have done another pass, incorporating all the feedback so far. I've also added the final shape of the opt-in API(thanks @JoelSpeed!). Please let me know if I missed anything/there are any other concerns. |
Updated to reflect the final API shape, minor corrections and updated links. Couple of things I want to point out:
|
#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs some review from Hive, and it would also be good to consult someone from updates to make sure the expectations around degrading operators on errors are in line, and that we don't unnecessarily block updates
|
||
If any of the above checks fail, the MSBIC will exit out of the sync. | ||
- Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference. | ||
- Next, it will check if the stub secret referenced within the `providerSpec` field of the `MachineSet` is managed i.e. `worker-user-data-managed` and not `worker-user-data`. If it is unmanaged, the cloned `providerSpec` will be updated to reference the managed stub secret. This step is platform/arch agnostic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional based on the unmanaged version being ignition v2? If there's already an ignition v3 version, do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, yes. The worker-user-data-managed
secret is actively managed & synced by the MCO already. There are also issues with the TLS cert expirations & user customizations(hopefully doable via API in the future) that could come up by using the unmanaged secret. Leaving the current v3's alone would be inconsistent and we should always use the managed pointer stub going forward.
EDIT: Realized I didn't answer the first question - The conditional would be that if the machine set in question is using the managed version of the stub Ignition or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the management already exist though? As part of this EP, surely the only thing that matters is v2 or v3? I'm not sure why the expiring certs of customisations come into this project, seems like an orthogonal issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no management of the cert today, that's in our backlog and will be a separate EP most likely.
Ignition doesn't deprecate fields within the spec 3 lifespan so yes, v3 configs should be use-able as they are, but it's nice to be able to check and update without affecting the original installer-generated one if necessary in the future. For users that opt in a sub-set of machinesets, they most likely have the originals point to worker-user-data, so we should likely use worker-user-data-managed in case they need to make changes to that original for other purposes.
And this also helps us keep it consistent within the MCO. Managed machinesets - managed secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this conversation ever resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our current stance remains that we would like to manage the secret as part of the bootimage lifecycle, so by opting in to the feature, we want that to be managed so it's a consistent experience regardless of what the old ignition was.
In future phases we'd like to introduce the ability to allow users to customize this, maybe through an API, but for now we're going to take over management
``` | ||
#### Tracking boot image history | ||
|
||
This is just an idea for the moment and is not planned to included when the feature initially GAs. Based on customer feedback and team capacity, this will be implemented in a later release. Boot Image History will be tracked by a new CR called `BootImageHistory`. The MCO will not directly consume from this CR. As a starting point, here is a stub type definition for this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we aren't going to be implementing this API yet, can you please add a note that thorough API review will need to be completed if and when the feature gets implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, will update!
- Before processing a MachineSet, the MSBIC will check if the following conditions are satisfied: | ||
- `ManagedBootImages` feature gate is active | ||
- The cluster and/or the machineset is opted-in to boot image updates. | ||
- The machineset does not have a valid owner reference. (eg. Hive, Cluster API and other managed machineset workflows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: hive-managed MachineSets do not use OwnerReference. (And can't: the "owner" is in a different cluster.) From within the spoke cluster, the only way to tell a MachineSet is hive-managed is via labels. And of course, I would not expect MCO to accommodate a "downstream" label like this.
That said, I think hive-managed MachineSets can probably support this enhancement. IIUC the concern in this context would be thrashing:
- Hive lays down the MachineSet;
- MCO updates its boot image in the providerSpec;
- Hive reverts;
- Repeat.
However, under normal circumstances this should not happen:
Hive manages spoke MachineSets via hub MachinePools. Our MachinePool controller generates the MachineSet manifests (by invoking vendored installer code) which include the providerSpec. Once we've generated a MachineSet and created it on the spoke, the only things we will reconcile on it are replicas, labels, and taints*. We have webhooks as well as controller code to block any other edits. If the providerSpec goes out of sync -- which is possible today for other reasons -- we'll log a warning but otherwise ignore the discrepancy.
If the user scales up the MachinePool, this may (depending on AZs) result in additional MachineSets being generated. In this scenario, we would generate them with the original bootImage... but then MCO on the spoke will edit them and they'll settle in just like the "originals" described above.
Am I missing a different reason this feature shouldn't work via hive?
*...unless this back door is enabled. Assuming we do decide to support this enhancement, it would be nice to validate that it is mutually exclusive with this annotation; else we'll end up thrashing as described. However, as currently written, it appears as though the opt-in mechanism is relatively complex -- a LabelSelector in a separate CR -- and I don't love the idea of hive trying to duplicate the logic MCO is using to process that thing. Particularly as it could change across releases. Absent some other opt-in mechanism -- e.g. a field in the MachineSet itself, which we would expose via MachinePool -- we may have to settle for adding a warning note to the comment on this const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @2uasimojo and the clarification of the OwnerReference
The opt-in mechanism is simple if you want to opt-in the whole cluster(all machinesets).
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
spec:
managedBootImages:
machineManagers:
- resource: machinesets
apiGroup: machine.openshift.io
selection:
mode: All
It does gets trickier if you want to enroll a subset of machinesets, which is when the label selector would come in. The above YAML snippet should work for most admins who would want these updates cluster wide and our goal is that it will be a one time action. Do you envision hive enrolling only a subset of machinesets?
I do have to caveat that shape of the API will be initially under TechPreview, so it subject to change. Once it goes GA - we will be committing to keep these options consistent through future releases.
We can definitely add an additional guard to check if the back door is enabled on the MSBIC side and exit with a log message. Question from my end, it looks like this is applied to MachinePools - I guess this isn't something that is visible on the MachineSet itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you envision hive enrolling only a subset of machinesets?
Ah, important point to clarify: I do not see hive managing this enrollment -- at least not directly. We like to minimize the number of in-cluster knobs we mirror explicitly on the hub, because it becomes a nightmare to try to keep up with differences across all the spoke OCP versions and infra providers we need to support. (Our management of MachineSets is itself a perfect example of this. Ask me about vsphere some time :eyeroll:) It also makes for a finicky and unintuitive support structure, where users must avoid managing those things directly on the spoke if they're using the hub mechanism... but not if they're not :P
Hive provides a generic mechanism for users to create/sync/patch arbitrary objects from the hub to the spoke. I was expecting they would use that to opt into this feature. And as such, they would have the ability to shape it however they choose, including the most complex version of selector-ness you can think of. This is what I was thinking of when I said:
I don't love the idea of hive trying to duplicate the logic MCO is using to process that thing.
...particularly since the thing, and the logic to process it, is likely to evolve over time across versions:
I do have to caveat that shape of the API will be initially under TechPreview, so it subject to change. Once it goes GA - we will be committing to keep these options consistent through future releases.
====
We can definitely add an additional guard to check if the back door is enabled on the MSBIC side and exit with a log message. Question from my end, it looks like this is applied to MachinePools - I guess this isn't something that is visible on the MachineSet itself?
Okay, let's play this out...
You are correct: Today the back door annotation is only on the MachinePool on hub side. But it wouldn't be tough to reflect that setting down into the MachineSets managed by that MachinePool.
As I mentioned earlier, I wouldn't expect MCO to pay attention to a hive.openshift.io/*
-namespaced ("downstream") annotation/label... but if we designed an explicit opt-out annotation/label in the MCO namespace, hive could set that thing on such MachineSets.
So theoretically you could have a MachinePool that manages MachineSet A with the "allow editing platform" knob; but MachineConfiguration.spec.managedBootImages
is set up such that it would otherwise match MachineSets A, B, and C. In this case it would not twiddle A... and how would that manifest? A status condition somewhere? A log message?
By extension, could that same knob be used to explicitly opt in, where the MachineConfiguration would otherwise not select this MachineSet?
And, importantly, is that too weird/complex of a UX to design in? Because to be clear, we're talking about designing in a fully-supported knob on a MachineSet (or MachineDeployment in CAPI-land?) that overrides the behavior of the MachineConfiguration selector. Feels kind of icky. @JoelSpeed thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation and the link to syncSet
mechanism! (:
So theoretically you could have a MachinePool that manages MachineSet A with the "allow editing platform" knob; but MachineConfiguration.spec.managedBootImages is set up such that it would otherwise match MachineSets A, B, and C. In this case it would not twiddle A... and how would that manifest? A status condition somewhere? A log message?
Our current error state is going to be a Prometheus metric based alert. So I suppose we could manifest this an unexpected opt-out annotation/label as an alert somehow.....but I'm not sure how popular those kind of alerts are in the Hive world. We were planning on adding a degrade on the MCO side once we got some mileage on the feature and see how customers interacted with it. At the moment, we don't think this feature warrants a degrade and blocking updates, so we went with an alert so it shows up on the console.
And, importantly, is that too weird/complex of a UX to design in? Because to be clear, we're talking about designing in a fully-supported knob on a MachineSet (or MachineDeployment in CAPI-land?) that overrides the behavior of the MachineConfiguration selector. Feels kind of icky. @JoelSpeed thoughts?
Agreed...this isn't great. We'd essentially be mirroring our API efforts with a hive only annotation based approach. I don't love it, I'll let Joel weigh in. I'll have a think on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think it's appropriate if it's a "hive only annotation". It would need to be a MCO-based, generic solution that hive just happens to be able to use to its advantage.
Re metrics/alerts: hive itself doesn't consume any such things from the spoke; but that's okay -- presumably these would be for the end user, not hive.
It occurred to me that we have this same consideration for the OwnerReference-based exclusion mechanism. What if MachineSet B has an OwnerReference? You would still need to surface that it has been excluded despite otherwise matching the configured selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed briefly with David, just to clarify @JoelSpeed , are you asking for:
- a general condition that something is wrong with the bootimage management in the MCO
- a specific condition for machinesets that have management already (e.g. hive) that is being opted in
If it's the former, maybe the location that makes the most sense would be in the MachineConfiguration object as a condition for machineset management in general. And we can add that in as a requirement for GA via a follow-up API PR on top of the one we merged just now. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the condition should be alongside the configuration of the bootimages, on the same object, to show any errors there. It doesn't necessarily have to degrade the cluster (yet or ever), but, there should be some feedback on the object that the config is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can call this out in my next pass. The error state on the configuration object could also reflect an unexpected OwnerReference/Label on a machineset as well.
Sorry about straying a bit off topic @2uasimojo! How do you feel about the approach suggested by @JoelSpeed here?
As I mentioned earlier, I wouldn't expect MCO to pay attention to a hive.openshift.io/*-namespaced ("downstream") annotation/label... but if we designed an explicit opt-out annotation/label in the MCO namespace, hive could set that thing on such MachineSets.
The current API has a label selector within it. We could, in theory either enforce, or document that on Hive managed clusters that the matchExpressions section within the boot image updates is required to exclude machinesets with a well known hive label on them. That should be pretty straight forward from what I understand so far and won't need any API changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you’re suggesting making the user responsible for excluding hive-managed MachineSets if they have the back door enabled? I’m okay with that in theory. Realistically nobody reads docs; but this back door isn’t documented anyway. This should be a small enough surface that this solution is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with @2uasimojo on slack a bit and we came up with some language to address the Hive concerns. I've plopped it into a subsection under Variation and form factor considerations, but I can move it if this isn't is the right place.
Main changes in last pass:
I have resolved conversations which I think have been addressed, please let me know if I've missed anything. |
@@ -107,6 +111,7 @@ Any form factor using the MCO and `MachineSets` will be impacted by this proposa | |||
- Standalone OpenShift: Yes, this is the main target form factor. | |||
- microshift: No, as it does [not](https://github.com/openshift/microshift/blob/main/docs/contributor/enabled_apis.md) use `MachineSets`. | |||
- Hypershift: No, Hypershift does not have this issue. | |||
- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team will update the comment on the backdoor annotation to indicate that it is mutually exclusive with this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✓ thanks
- Before processing a MachineSet, the MSBIC will check if the following conditions are satisfied: | ||
- `ManagedBootImages` feature gate is active | ||
- The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object. | ||
- The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this scenario be reported back to a user, their selection should not select owned MachineSets right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called this out in the Error & Alert Mechanism section, but in brief it will be via an error condition on the MachineConfiguration
object. I can also detail it a little bit here too - I figured since we have a few reasons for erroring now, it might be best to have a section for it
|
||
If any of the above checks fail, the MSBIC will exit out of the sync. | ||
- Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference. | ||
- Next, it will check if the stub secret referenced within the `providerSpec` field of the `MachineSet` is managed i.e. `worker-user-data-managed` and not `worker-user-data`. If it is unmanaged, the cloned `providerSpec` will be updated to reference the managed stub secret. This step is platform/arch agnostic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this conversation ever resolved?
I'll defer to @yuqi-zhang for final tag. I think the markdownlint can be overridden since this EP was written before the latest changes to that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for the collaboration everyone! I think this is in good shape, so let's get this in and we can follow up on any additional points as they come up.
/override ci/prow/markdownlint
Not sure if I have that power
@yuqi-zhang: yuqi-zhang unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlebon, travier, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Based on #1496 (comment), enhancement template changed half way through this, only minor updates since. Overriding to save having to revise all of the titles that changed indentation |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@djoshy: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This patch describes an enhancement from the MCO team to provide updated boot images via a new sub-controller within the MCO.
Related:
Prior enhancement - #201
GCP PoC - openshift/machine-config-operator#3980
Jira epic link - https://issues.redhat.com/browse/MCO-589