Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: add ignition dual spec 2/3 support #97

Merged
merged 1 commit into from
Jan 28, 2020

Conversation

yuqi-zhang
Copy link
Contributor

Add an enhancement proposal for supporting both ignition spec 2
and 3 for OS provisioning/updating.

Signed-off-by: Yu Qi Zhang [email protected]

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2019
Acceptance criteria:
- Essentially the same as story 1

** As a user of Openshift, I’d like to install a fresh ignition spec 3 cluster **
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the users just want to install a cluster, imo there is less value in customers asking for spec2 vs spec3

above we mention that the installer can switch to spec3 completely, do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the workflow should look the same for IPI

do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage

We'll definitely have to communicate in the docs which preview version has spec 3 support. The installer maybe should add capability to detect if the user is still on an old version and reject the install.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer maybe should add capability to detect if the user is still on an old version and reject the install.

at what stage do you think the installer will be able to do that?

in UPI, the installer only outputs ignition files and rest is user controlled process.

Copy link
Contributor Author

@yuqi-zhang yuqi-zhang Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true, I guess in this case the best we can do is to alert the user when they use a new version of the installer that this build will output spec 3 ignition configs, which is only supported by RHCOS versions x and later?

This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

Copy link
Contributor Author

@yuqi-zhang yuqi-zhang Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions of course since UPI in general will be harder to coordinate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

that means that we don't have N-1 compatibility, which is not we tell our users! the n-1 should be fine as long as you don't depend on a feature that is in N.

This goes back to openshift/installer#1399 which might be a better UPI workflow.

I have specifically pushed back on that idea because, that allows us to demand this 4.4 works with 4.4 bootimage only, when that shouldn't be the case for most users. and also something we need to support in our clusters because 4.1 created clusters have new nodes join in at 4.1 and then pivot to say 4.4.

Open to suggestions of course since UPI in general will be harder to coordinate

+1
as long as we explicitly mention what we are doing for this case, i'm fine.

The translation will happen when the version of MCO with dual support and
translator is first deployed; it will detect the existing config being spec 2,
generate a new renderedconfig based on a translator from spec2 to spec 3, and
do a “dummy reboot” (if necessary) to bring the cluster to the desired config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "dummy reboot" in this scenario? When would the "dummy reboot" be necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the cluster machineconfig is saved as a "desiredConfig" hash, I'm not sure the best way to translate that yet. This just means that after the specs are translated, if we need, we can set a new desiredConfig that specifies a successful translation to spec 3, which the cluster needs to reboot to bring the currentConfig up to.

Acceptance criteria:
- The machineconfig is applied successfully, if the user has defined a correct spec 3 ignition snippet
- The user is properly alerted if they attempt to apply a spec 2 config, and the machineconfig fails to apply
- The user is given necessary docs to remove the undesired spec 2 config and to translate it to a spec 3 config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "the undesired spec 2 config" in this case? The last bullet says any spec 2 config would fail to apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it just means any spec 2 config in the system. Basically steps to oc delete machineconfig/xxx and then apply a new one

Many existing tests will also have to be updated given the spec change.


### Upgrade / Downgrade Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgrades from spec 3 to spec 2 are off the table, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We should never have to do that. We do however probably need rollback functionality to recover from bad machineconfig states.

- The user should have received notification that the update will be changing spec version, as well as received necessary documentation on how to recover a failed update
- CI tests are put in place to make sure the existing versions can be updated to the new payload

** As a user of openshift, I’d like to install from a spec 2 bootimage andimmediately update to a spec 3 payload **
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "andimmediately" -> "and immediately"


** How to properly detect which ignition version to serve new machines **

This will likely be somewhat difficult to handle, as the wrong version on a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we teach Ignition to request a version using Accept headers? Looks like it may already do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And with the machine-config server switching on application/vnd.coreos.ignition+json;version=... and an ability to translate spec-2 configs to spec-3 and spec-3 configs to spec-2, we'd be able to run a full 4.4.z (or whatever z stream) supporting both configs, and warning about 4.5 (or whatever) incompatibility if we saw any spec-2 requests come in.

@yuqi-zhang
Copy link
Contributor Author

Updated per discussions with @crawford, PTAL.

The MCS will host both spec 3 and spec 2 configs, with a functionality to
translate spec 3 down to spec 2 configs, and spec 2 up to spec 3 configs.
The MCS will first check which ignition spec version a new node supports,
before serving a config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there already something in place that we can leverage? or do we need to check with MAO (or something else?) to provide us with something like that


## Proposal

This change is multi-component:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericavonb brought up that we might need to sync/communicate with KNI as well (iirc)

@cgwalters
Copy link
Member

The MCS will first check which ignition spec version a new node supports, before serving a config

See coreos/ignition#889
Which was sketched out initially in openshift/machine-config-operator#492

- [ ] Docs are updated to reflect the new config version

#### Phase 2:
- [ ] RHCOS bootimage is updated to accept ignition spec 3 configs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec v3 only

@yuqi-zhang
Copy link
Contributor Author

Updated to reflect the new phase 1/2 plans. Specifically, our main objective in phase 1 becomes adding the ability to apply spec 3 machineconfigs. Does not involve installer/RHCOS.

Also note the new Managing stub master/worker ignition configs section, which will also be a separate enhancement PR.

- [ ] MCD gaining the ability to process both spec 2 and spec 3 configs
- [ ] Create tests to apply spec 3 configs to spec 2 clusters successfully
- [ ] An alerting mechanism is put in place for outdated and incompatible/non-translatable configs
- [ ] Docs are updated to reflect the new config version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crawford can you ack this plan if it does look good to you? Not sure who else to bring in

Copy link
Contributor

@crawford crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignition is always capitalized.

Why aren’t we using spec 3 on new clusters in phase 1?

This enhancement proposal aims to add dual ignition specification version 2/3
(ignition version 0/2) support to Openshift 4.x, which currently only support
ignition version 0 spec 2 for OS provisioning and machine config updates. We
aim to introduce a non-breaking method to switch all new and existing clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t non-breaking, it’s just got a grace period.

The objective of this phase is to allow users to apply machineconfigs with
ignition spec 3 support to new and existing clusters. The breakdown is:

- a tool is created to translate between ignition spec versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCO makes use directly of ignition typing via vendoring (currently 2.2), so we'd need a method to translate the typing on the system from 2->3, unless you think the MCO should be on 2 indefinitely for some reason

- RHCOS bootimages switches to only accept ignition spec 3 configs
- The Openshift installer is updated to generate spec 3 configs
- Remaining MC* components generate spec 3 only
- MCO enforces that all configs are spec 3 before allowing the CVO to start the update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How’s this going to work? The MCO isn’t going to be able to distinguish between a z-stream update and a y-stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been fleshed out yet. I recall discussion regarding using unupgradable flag for this

requirement for security/compliance purposes in OCP. The existing version on
RHCOS systems (ignition version v0.33) carries a spec version (spec version 2,
henceforth “Spec 2”) that is not compatible with Spec 3. Thus we would like to
update the ignition version on RHCOS/Installer/MCO to make use of the changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we record the rational for this approach vs teaching Ignition/RHCOS how to handle both spec variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only ignition (the package) is updated to handle both, all MachineConfigs applied after firstboot will not be affected by the change. The MCO does e.g. file writing itself based on machineconfigs, which use ignition typing.

- [ ] A config translator is created to translate from ignition spec 3 to spec 2
- [ ] The MCO is updated to support both ignition spec 2 and 3, with:
- [ ] MCC rendering spec 3 configs to spec 2
- [ ] MCD gaining the ability to process both spec 2 and spec 3 configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rendered configs are spec 2, do we need to bother with this in phase 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, perhaps I phrased this poorly, will revise

- Those that we’re not 100% sure we can directly translate, but we can infer what the user is doing and do a translation
- Those that we’re 100% sure we CANNOT translate directly, and requires user input for us to correctly translate

During phase 1 we should attempt to translate on a best-effort basis. If the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify that we will only perform the translation of each config if we are certain that it will be successful. “Best-effort” here refers to the act of visiting all configs, but not the translation for each config.

when the MCO with dual support is first deployed onto the cluster). This will
only be required as part of the MCO.

Note that there exists three types of spec 2 configs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of this document, there are two types: those that we are certain we can translate and everything else. We aren’t going to handle the uncertain and certain-failure cases differently (are we?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

the user that there are untranslatable configs.

During phase two, we should fail updates unless the cluster is fully on spec
3 config. This effectively means that UPI clusters are at risk when upgrading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are UPI clusters at risk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More likely the user supplied their own ignition config customizations


The bootstrap ignition configs are updated to be spec 3. The stub ignition
configs for master/worker nodes are updated to spec 3, and referece a different
endpoint of the MCS which will serve ignition spec 3 configs. RHCOS images pinned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint should be the same. Ignition already uses an Accept header to specify which spec versions it can process.

The MCO will flat out reject spec 2 configs, and refuse to upgrade clusters
that have spec 2 bits.

The MCO will also throw an alert upon: seeing a spec 2 machineconfig applied to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we want to bother with alerts here. We can just validate the MachineConfig before the CR is committed.

@yuqi-zhang
Copy link
Contributor Author

Added some fixes.

Why aren’t we using spec 3 on new clusters in phase 1?

This will require non-trivial changes to the installer and MCO which the team doesn't think we have enough time for in the 4.4 timeframe

Add an enhancement proposal for supporting both ignition spec 2
and 3 for OS provisioning/updating.

Signed-off-by: Yu Qi Zhang <[email protected]>
@runcom
Copy link
Member

runcom commented Jan 28, 2020

ping @openshift/openshift-architects

it does lgtm, can any of you approve?

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2020
@openshift-merge-robot openshift-merge-robot merged commit b5d54fc into openshift:master Jan 28, 2020
@runcom
Copy link
Member

runcom commented Jan 28, 2020

damn, I thought I couldn't approve but I still think this is gonna be the way we all agreed on - @crawford since you reviewed this, please take a last look at this.


All machineconfigs are switched to spec 3. Spec 2 support will exist in the form
of served Ignition configs to new nodes joining the cluster that have Ignition
v0 in the bootimage. MCS will handle this with 2 endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCS will handle this with 2 endpoints.

This should be a single endpoint. Ignition already uses the Accept HTTP header to specify which configs it can accept and the order of preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yuqi-zhang this is related also to the ignition endpoint contained in that secret created by the installer (and unmanaged)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the proposal. We would still first need an option to modify the secret anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants