enhancement: add ignition dual spec 2/3 support #97

yuqi-zhang · 2019-11-04T18:57:22Z

Add an enhancement proposal for supporting both ignition spec 2
and 3 for OS provisioning/updating.

Signed-off-by: Yu Qi Zhang [email protected]

abhinavdahiya · 2019-11-04T20:11:10Z

enhancements/ignition-spec-dual-support.md

+Acceptance criteria:
+ - Essentially the same as story 1
+
+** As a user of Openshift, I’d like to install a fresh ignition spec 3 cluster **


the users just want to install a cluster, imo there is less value in customers asking for spec2 vs spec3

above we mention that the installer can switch to spec3 completely, do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage

I agree, the workflow should look the same for IPI

do we have some idea how we can make sure UPI customers are using the correct boot-images when they get the spec3 ignition configs but maybe try to use a spec3 supporting bootimage

We'll definitely have to communicate in the docs which preview version has spec 3 support. The installer maybe should add capability to detect if the user is still on an old version and reject the install.

The installer maybe should add capability to detect if the user is still on an old version and reject the install.

at what stage do you think the installer will be able to do that?

in UPI, the installer only outputs ignition files and rest is user controlled process.

Oh true, I guess in this case the best we can do is to alert the user when they use a new version of the installer that this build will output spec 3 ignition configs, which is only supported by RHCOS versions x and later?

This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

Open to suggestions of course since UPI in general will be harder to coordinate

This goes back to openshift/installer#1399 which might be a better UPI workflow. Alternatively, for example if we decide to do this in 4.5, all 4.5 images + installer generated configs are spec 3 by default only. In that case a user would only run in to a version mistmatch if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

if they try to install a 4.5 cluster with 4.4 bootimages, which they shouldn't do anyways.

that means that we don't have N-1 compatibility, which is not we tell our users! the n-1 should be fine as long as you don't depend on a feature that is in N.

This goes back to openshift/installer#1399 which might be a better UPI workflow.

I have specifically pushed back on that idea because, that allows us to demand this 4.4 works with 4.4 bootimage only, when that shouldn't be the case for most users. and also something we need to support in our clusters because 4.1 created clusters have new nodes join in at 4.1 and then pivot to say 4.4.

Open to suggestions of course since UPI in general will be harder to coordinate

+1
as long as we explicitly mention what we are doing for this case, i'm fine.

sgreene570 · 2019-11-04T20:25:22Z

enhancements/ignition-spec-dual-support.md

+The translation will happen when the version of MCO with dual support and
+translator is first deployed; it will detect the existing config being spec 2,
+generate a new renderedconfig based on a translator from spec2 to spec 3, and
+do a “dummy reboot” (if necessary) to bring the cluster to the desired config,


What exactly do you mean by "dummy reboot" in this scenario? When would the "dummy reboot" be necessary?

Since the cluster machineconfig is saved as a "desiredConfig" hash, I'm not sure the best way to translate that yet. This just means that after the specs are translated, if we need, we can set a new desiredConfig that specifies a successful translation to spec 3, which the cluster needs to reboot to bring the currentConfig up to.

sgreene570 · 2019-11-04T20:37:31Z

enhancements/ignition-spec-dual-support.md

+Acceptance criteria:
+ - The machineconfig is applied successfully, if the user has defined a correct spec 3 ignition snippet
+ - The user is properly alerted if they attempt to apply a spec 2 config, and the machineconfig fails to apply
+ - The user is given necessary docs to remove the undesired spec 2 config and to translate it to a spec 3 config


What is "the undesired spec 2 config" in this case? The last bullet says any spec 2 config would fail to apply.

Sorry, it just means any spec 2 config in the system. Basically steps to oc delete machineconfig/xxx and then apply a new one

sgreene570 · 2019-11-04T20:47:58Z

enhancements/ignition-spec-dual-support.md

+Many existing tests will also have to be updated given the spec change.
+
+
+### Upgrade / Downgrade Strategy


Downgrades from spec 3 to spec 2 are off the table, right?

Yes. We should never have to do that. We do however probably need rollback functionality to recover from bad machineconfig states.

wking · 2019-11-05T22:10:38Z

enhancements/ignition-spec-dual-support.md

+ - The user should have received notification that the update will be changing spec version, as well as received necessary documentation on how to recover a failed update
+ - CI tests are put in place to make sure the existing versions can be updated to the new payload
+
+** As a user of openshift, I’d like to install from a spec 2 bootimage andimmediately update to a spec 3 payload **


nit: "andimmediately" -> "and immediately"

wking · 2019-11-05T22:17:42Z

enhancements/ignition-spec-dual-support.md

+
+** How to properly detect which ignition version to serve new machines **
+
+This will likely be somewhat difficult to handle, as the wrong version on a 


Can we teach Ignition to request a version using Accept headers? Looks like it may already do that?

And with the machine-config server switching on application/vnd.coreos.ignition+json;version=... and an ability to translate spec-2 configs to spec-3 and spec-3 configs to spec-2, we'd be able to run a full 4.4.z (or whatever z stream) supporting both configs, and warning about 4.5 (or whatever) incompatibility if we saw any spec-2 requests come in.

yuqi-zhang · 2019-11-19T22:08:14Z

Updated per discussions with @crawford, PTAL.

runcom · 2019-11-22T10:25:26Z

enhancements/ignition-spec-dual-support.md

+The MCS will host both spec 3 and spec 2 configs, with a functionality to
+translate spec 3 down to spec 2 configs, and spec 2 up to spec 3 configs.
+The MCS will first check which ignition spec version a new node supports,
+before serving a config.


is there already something in place that we can leverage? or do we need to check with MAO (or something else?) to provide us with something like that

runcom · 2019-11-22T11:53:48Z

enhancements/ignition-spec-dual-support.md

+
+## Proposal
+
+This change is multi-component:


@ericavonb brought up that we might need to sync/communicate with KNI as well (iirc)

cgwalters · 2019-11-22T14:12:48Z

The MCS will first check which ignition spec version a new node supports, before serving a config

See coreos/ignition#889
Which was sketched out initially in openshift/machine-config-operator#492

runcom · 2019-11-26T10:35:35Z

enhancements/ignition-spec-dual-support.md

+- [ ] Docs are updated to reflect the new config version
+
+#### Phase 2:
+- [ ] RHCOS bootimage is updated to accept ignition spec 3 configs


spec v3 only

yuqi-zhang · 2019-12-06T01:56:46Z

Updated to reflect the new phase 1/2 plans. Specifically, our main objective in phase 1 becomes adding the ability to apply spec 3 machineconfigs. Does not involve installer/RHCOS.

Also note the new Managing stub master/worker ignition configs section, which will also be a separate enhancement PR.

runcom · 2019-12-17T17:51:46Z

enhancements/ignition-spec-dual-support.md

+    - [ ] MCD gaining the ability to process both spec 2 and spec 3 configs
+- [ ] Create tests to apply spec 3 configs to spec 2 clusters successfully
+- [ ] An alerting mechanism is put in place for outdated and incompatible/non-translatable configs
+- [ ] Docs are updated to reflect the new config version


@crawford can you ack this plan if it does look good to you? Not sure who else to bring in

crawford

Ignition is always capitalized.

Why aren’t we using spec 3 on new clusters in phase 1?

crawford · 2019-12-21T13:35:07Z

enhancements/ignition-spec-dual-support.md

+This enhancement proposal aims to add dual ignition specification version 2/3
+(ignition version 0/2) support to Openshift 4.x, which currently only support
+ignition version 0 spec 2 for OS provisioning and machine config updates. We
+aim to introduce a non-breaking method to switch all new and existing clusters


This isn’t non-breaking, it’s just got a grace period.

crawford · 2019-12-21T13:36:39Z

enhancements/ignition-spec-dual-support.md

+The objective of this phase is to allow users to apply machineconfigs with
+ignition spec 3 support to new and existing clusters. The breakdown is:
+
+ - a tool is created to translate between ignition spec versions


Why is this needed?

MCO makes use directly of ignition typing via vendoring (currently 2.2), so we'd need a method to translate the typing on the system from 2->3, unless you think the MCO should be on 2 indefinitely for some reason

crawford · 2019-12-21T13:44:14Z

enhancements/ignition-spec-dual-support.md

+ - RHCOS bootimages switches to only accept ignition spec 3 configs
+ - The Openshift installer is updated to generate spec 3 configs
+ - Remaining MC* components generate spec 3 only
+ - MCO enforces that all configs are spec 3 before allowing the CVO to start the update


How’s this going to work? The MCO isn’t going to be able to distinguish between a z-stream update and a y-stream.

This has not been fleshed out yet. I recall discussion regarding using unupgradable flag for this

crawford · 2019-12-21T13:51:09Z

enhancements/ignition-spec-dual-support.md

+requirement for security/compliance purposes in OCP. The existing version on
+RHCOS systems (ignition version v0.33) carries a spec version (spec version 2,
+henceforth “Spec 2”) that is not compatible with Spec 3. Thus we would like to
+update the ignition version on RHCOS/Installer/MCO to make use of the changes.


Can we record the rational for this approach vs teaching Ignition/RHCOS how to handle both spec variants?

If only ignition (the package) is updated to handle both, all MachineConfigs applied after firstboot will not be affected by the change. The MCO does e.g. file writing itself based on machineconfigs, which use ignition typing.

crawford · 2019-12-21T14:05:11Z

enhancements/ignition-spec-dual-support.md

+- [ ] A config translator is created to translate from ignition spec 3 to spec 2
+- [ ] The MCO is updated to support both ignition spec 2 and 3, with:
+    - [ ] MCC rendering spec 3 configs to spec 2
+    - [ ] MCD gaining the ability to process both spec 2 and spec 3 configs


If the rendered configs are spec 2, do we need to bother with this in phase 1?

I guess not, perhaps I phrased this poorly, will revise

crawford · 2019-12-21T18:34:53Z

enhancements/ignition-spec-dual-support.md

+ - Those that we’re not 100% sure we can directly translate, but we can infer what the user is doing and do a translation 
+ - Those that we’re 100% sure we CANNOT translate directly, and requires user input for us to correctly translate
+
+During phase 1 we should attempt to translate on a best-effort basis. If the


We should clarify that we will only perform the translation of each config if we are certain that it will be successful. “Best-effort” here refers to the act of visiting all configs, but not the translation for each config.

crawford · 2019-12-21T18:36:34Z

enhancements/ignition-spec-dual-support.md

+when the MCO with dual support is first deployed onto the cluster). This will
+only be required as part of the MCO.
+
+Note that there exists three types of spec 2 configs:


For the purpose of this document, there are two types: those that we are certain we can translate and everything else. We aren’t going to handle the uncertain and certain-failure cases differently (are we?).

crawford · 2019-12-21T18:37:32Z

enhancements/ignition-spec-dual-support.md

+the user that there are untranslatable configs.
+
+During phase two, we should fail updates unless the cluster is fully on spec
+3 config. This effectively means that UPI clusters are at risk when upgrading


Why are UPI clusters at risk?

More likely the user supplied their own ignition config customizations

crawford · 2019-12-21T18:39:28Z

enhancements/ignition-spec-dual-support.md

+
+The bootstrap ignition configs are updated to be spec 3. The stub ignition
+configs for master/worker nodes are updated to spec 3, and referece a different
+endpoint of the MCS which will serve ignition spec 3 configs. RHCOS images pinned


The endpoint should be the same. Ignition already uses an Accept header to specify which spec versions it can process.

crawford · 2019-12-21T18:46:25Z

enhancements/ignition-spec-dual-support.md

+The MCO will flat out reject spec 2 configs, and refuse to upgrade clusters
+that have spec 2 bits.
+
+The MCO will also throw an alert upon: seeing a spec 2 machineconfig applied to


I don’t think we want to bother with alerts here. We can just validate the MachineConfig before the CR is committed.

yuqi-zhang · 2020-01-04T00:30:26Z

Added some fixes.

Why aren’t we using spec 3 on new clusters in phase 1?

This will require non-trivial changes to the installer and MCO which the team doesn't think we have enough time for in the 4.4 timeframe

Add an enhancement proposal for supporting both ignition spec 2 and 3 for OS provisioning/updating. Signed-off-by: Yu Qi Zhang <[email protected]>

runcom · 2020-01-28T12:20:14Z

ping @openshift/openshift-architects

it does lgtm, can any of you approve?

/lgtm

openshift-ci-robot · 2020-01-28T12:20:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

runcom · 2020-01-28T12:32:12Z

damn, I thought I couldn't approve but I still think this is gonna be the way we all agreed on - @crawford since you reviewed this, please take a last look at this.

crawford · 2020-03-11T18:47:42Z

enhancements/ignition-spec-dual-support.md

+
+All machineconfigs are switched to spec 3. Spec 2 support will exist in the form
+of served Ignition configs to new nodes joining the cluster that have Ignition
+v0 in the bootimage. MCS will handle this with 2 endpoints.


MCS will handle this with 2 endpoints.

This should be a single endpoint. Ignition already uses the Accept HTTP header to specify which configs it can accept and the order of preference.

cc @yuqi-zhang this is related also to the ignition endpoint contained in that secret created by the installer (and unmanaged)

I can update the proposal. We would still first need an option to modify the secret anyways.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 4, 2019

openshift-ci-robot requested review from jwmatthews and tbielawa November 4, 2019 18:57

abhinavdahiya reviewed Nov 4, 2019

View reviewed changes

sgreene570 reviewed Nov 4, 2019

View reviewed changes

wking reviewed Nov 5, 2019

View reviewed changes

vrutkovs mentioned this pull request Nov 12, 2019

[fcos] Support Ignition3 and FCOS openshift/installer#2548

Merged

3 tasks

yuqi-zhang force-pushed the ignition-spec-3 branch from cf9126c to 3d6193d Compare November 19, 2019 22:07

runcom reviewed Nov 22, 2019

View reviewed changes

runcom reviewed Nov 26, 2019

View reviewed changes

yuqi-zhang force-pushed the ignition-spec-3 branch from 3d6193d to 9152ff4 Compare December 6, 2019 01:55

runcom reviewed Dec 17, 2019

View reviewed changes

crawford reviewed Dec 21, 2019

View reviewed changes

yuqi-zhang force-pushed the ignition-spec-3 branch from 9152ff4 to 37bd8f5 Compare January 4, 2020 00:29

enhancement: add ignition dual spec 2/3 support

e1e6b6f

Add an enhancement proposal for supporting both ignition spec 2 and 3 for OS provisioning/updating. Signed-off-by: Yu Qi Zhang <[email protected]>

yuqi-zhang force-pushed the ignition-spec-3 branch from 37bd8f5 to e1e6b6f Compare January 4, 2020 00:42

openshift-ci-robot assigned runcom Jan 28, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2020

openshift-merge-robot merged commit b5d54fc into openshift:master Jan 28, 2020

cgwalters mentioned this pull request Mar 6, 2020

proposal for rebasing on Ignition master openshift/os#402

Closed

crawford reviewed Mar 11, 2020

View reviewed changes

		Many existing tests will also have to be updated given the spec change.


		### Upgrade / Downgrade Strategy


		How to properly detect which ignition version to serve new machines

		This will likely be somewhat difficult to handle, as the wrong version on a

enhancement: add ignition dual spec 2/3 support #97

enhancement: add ignition dual spec 2/3 support #97

Conversation

yuqi-zhang commented Nov 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang Nov 4, 2019 • edited Loading

Choose a reason for hiding this comment

yuqi-zhang Nov 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Nov 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented Nov 22, 2019

Choose a reason for hiding this comment

yuqi-zhang commented Dec 6, 2019

Choose a reason for hiding this comment

crawford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Jan 4, 2020

runcom commented Jan 28, 2020

openshift-ci-robot commented Jan 28, 2020

runcom commented Jan 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang Nov 4, 2019 •

edited

Loading

yuqi-zhang Nov 4, 2019 •

edited

Loading