Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-820: Add boot images update opt-in mechanism #1672

Merged
merged 2 commits into from
Feb 28, 2024

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Nov 17, 2023

In addition to the feature gate that will be added by #1646, we'd also like to add a secondary opt-in knob for the user to toggle the boot image update mechanism. We expect that some users would want to keep their boot images static and this will serve as the primary switch when this feature leaves tech preview.

This PR introduces a new struct in types_machineconfiguration.go, ManagedBootImages which encloses an array of MachineManager objects. A MachineManager object contains the resource type of the machine management object that is being opted-in, the API group of that object and a union discriminant object of the type MachineManagerSelector. This object MachineManagerSelector encloses:

  • The union discriminator, Mode, can be set to three values : All, Partial and None.
  • Partial This is a label selector that will be used by objects to opt-in. When the Mode is set to Partial mode, all machinesets in the selector list would be considered enrolled for updates. For all other values of Mode, this selector does not exist.

Here is a YAML snippet of what this config could look like

managedBootImages:
  machineManagers:
  - resource: machinesets
    apiGroup: cluster.x-k8s.io
    selection:
      mode: Partial
      partial:
        matchLabels: {}
  - resource: machinesets
    apiGroup: machine.openshift.io
    selection:
       mode: All

The above YAML would provide a label selector for enrolling Cluster API MachineSets and all Machine API MachineSets. Only one machineManager per unique pair of resource/APIGroup will be accepted. This is to avoid conflicting instructions provided by multiple machineManagers.

We may also add future configuration points to ManagedBootImages. This PR also adds a new degrade condition to the MachineConfigPool CRD. This will be used by the new subcontroller to signal a failure in updating boot images.

More details about this feature can be found in enhancement #1496. Comments welcome!

Copy link
Contributor

openshift-ci bot commented Nov 17, 2023

Hello @djoshy! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 17, 2023
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 804737f to 6efc17d Compare November 22, 2023 21:18
@openshift-ci openshift-ci bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 22, 2023
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 6efc17d to 5952155 Compare November 22, 2023 21:26
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2023
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 22, 2023
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 5952155 to 6837787 Compare November 22, 2023 21:33
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2023
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 6837787 to f36270a Compare November 27, 2023 13:43
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
@djoshy djoshy force-pushed the manage-boot-image-toggle branch 5 times, most recently from 1688711 to bb4ff6d Compare November 28, 2023 20:28
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from bb4ff6d to 3ee2435 Compare December 7, 2023 17:08
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 7, 2023
@djoshy
Copy link
Contributor Author

djoshy commented Dec 7, 2023

So with this new approach, we still have have three modes:

  • Enabled - this will run the feature on all machinesets
  • Disabled - this will turn off the feature completely
  • CustomConfig - this will run the feature on machinesets that match with the selector enclosed within the ManagedBootImageConfigCustomization struct

We could then add any future configurations as new fields in the ManagedBootImageConfigCustomization object.
In the event that we do add future configurations, Enabled will have to assume certain defaults for them. I think this is ok, as long as we document these assumptions/defaults so users will know what to expect.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the shape of the union, I'm wondering if CustomConfig is one level too many. What would go alongside a MachineSetSelector? Whatever it is, it would have to work in tandem with the selector. An exclusions list perhaps?

Either way, I think Custom is too broad. We do have the option of adding more enum values to the discriminator later, so perhaps for now having the selector is a better discriminator.

Or perhaps we go for something like:

managedBootImages:
  managedImages: None | Partial | All
  partial:
    machineSetSelector: ...
   controlPlaneMachineSetSelector: ... <- a potential future addition?

If we take that route, I guess then machineSetSelector doesn't need to be required, but we should include a validation that partial contains at least one non-empty property using the MinItems validation

operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 3ee2435 to 342f64d Compare December 13, 2023 21:18
@djoshy
Copy link
Contributor Author

djoshy commented Dec 13, 2023

I've updated the struct to follow the shape mentioned in the comment here: #1672 (comment)
It does clean up a lot of weird naming issues, thanks @JoelSpeed !

@djoshy djoshy closed this Dec 13, 2023
@djoshy
Copy link
Contributor Author

djoshy commented Feb 12, 2024

/test e2e-gcp

/test e2e-azure

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few formatting things and a question on one of the validations to work out

operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
operator/v1/types_machineconfiguration.go Outdated Show resolved Hide resolved
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from cbbc1a9 to 90d147e Compare February 13, 2024 14:15
@djoshy
Copy link
Contributor Author

djoshy commented Feb 13, 2024

Updated to:

  • render nicely in kubectl explain
  • removed the featureset aware validation rule

@JoelSpeed
Copy link
Contributor

Ok, from an API perspective I'm happy with this now, I would like to take a final pass over the enhancement before we merge as well

@yuqi-zhang I believe you've been reviewing the EP, are you happy with the shape of this API?

@djoshy
Copy link
Contributor Author

djoshy commented Feb 13, 2024

/test e2e-upgrade

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked over the managedbootimages commit again, and the current API lgtm!

@djoshy djoshy force-pushed the manage-boot-image-toggle branch 2 times, most recently from 9f36019 to 576ec0c Compare February 15, 2024 17:02
@djoshy
Copy link
Contributor Author

djoshy commented Feb 15, 2024

Updated based on discussion from the enhancement:

  • removed the MachineConfigPool degrade type since we are switching to an alert
  • enclosed the MachineSet label selector within a struct so we have flexibility to expand in the future

@djoshy djoshy changed the title Add boot images update opt-in & new mcp degrade condition Add boot images update opt-in mechanism Feb 15, 2024
@djoshy djoshy force-pushed the manage-boot-image-toggle branch from 576ec0c to 247d3e0 Compare February 15, 2024 20:32
@djoshy djoshy changed the title Add boot images update opt-in mechanism MCO-820: Add boot images update opt-in mechanism Feb 16, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 16, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 16, 2024

@djoshy: This pull request references MCO-820 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

In addition to the feature gate that will be added by #1646, we'd also like to add a secondary opt-in knob for the user to toggle the boot image update mechanism. We expect that some users would want to keep their boot images static and this will serve as the primary switch when this feature leaves tech preview.

This PR introduces a new struct in types_machineconfiguration.go, ManagedBootImages which encloses an array of MachineManager objects. A MachineManager object contains the resource type of the machine management object that is being opted-in, the API group of that object and a union discriminant object of the type MachineManagerSelector. This object MachineManagerSelector encloses:

  • The union discriminator, Mode, can be set to three values : All, Partial and None.
  • Partial This is a label selector that will be used by objects to opt-in. When the Mode is set to Partial mode, all machinesets in the selector list would be considered enrolled for updates. For all other values of Mode, this selector does not exist.

Here is a YAML snippet of what this config could look like

managedBootImages:
 machineManagers:
 - resource: machinesets
   apiGroup: cluster.x-k8s.io
   selection:
     mode: Partial
     partial:
       matchLabels: {}
 - resource: machinesets
   apiGroup: machine.openshift.io
   selection:
      mode: All

The above YAML would provide a label selector for enrolling Cluster API MachineSets and all Machine API MachineSets. Only one machineManager per unique pair of resource/APIGroup will be accepted. This is to avoid conflicting instructions provided by multiple machineManagers.

We may also add future configuration points to ManagedBootImages. This PR also adds a new degrade condition to the MachineConfigPool CRD. This will be used by the new subcontroller to signal a failure in updating boot images.

More details about this feature can be found in enhancement #1496. Comments welcome!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@JoelSpeed
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2024
Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, JoelSpeed, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2024
@JoelSpeed
Copy link
Contributor

/override ci/prow/verify-crd-schema

Introduction fo techpreview/custom versions of CRDs has caused some pre-existing failures that cannot be fixed

Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema

In response to this:

/override ci/prow/verify-crd-schema

Introduction fo techpreview/custom versions of CRDs has caused some pre-existing failures that cannot be fixed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@djoshy
Copy link
Contributor Author

djoshy commented Feb 28, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

@djoshy: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 217662e into openshift:master Feb 28, 2024
17 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-config-api-container-v4.16.0-202402281710.p0.g217662e.assembly.stream.el9 for distgit ose-cluster-config-api.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants