-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider making config changes truly transactional on RHCOS #1190
Comments
Another way to look at this is, it'd enforce that config changes require a reboot. We could drop the hacks around using journald for which config we're in, because |
Also a prerequisite is moving the pull secret from |
👍 IIUC, this would prevent new static pod manifests to take effect until we reboot. |
This proposal looks interesting! |
I don't see a blocker to doing this on firstboot too. |
Looks like I might have misunderstood little bit making /etc/ changes transactional. What will happen with this proposal in place during firstboot? Most likely during firstboot we will have config changes (like new file to be added in /etc/ignition-machine-config-encapsulated.json ) and also new machine-os-content. Today, we create new deployment for mchine-os-content changes which gets applied after reboot. What will happen to files like /etc/ignition-machine-config-encapsulated.json, will it be written to current deployment or in new deployment? |
The motivation for filing this issue was the etcd upgrade bug. However, we're also now seeing this for kubelet config. And see also a related podman config bug. One comment I had on the podman bug was:
|
That's a good question. Note that today, it's Ignition which writes the initial files, not the MCD. So unless we changed how Ignition works too, the answer would be that the files are written in the current root. And in fact, we need this to happen because we need the pull secret I think it's OK if we only do this "transactional /etc" for MCD upgrades, because that's where we're actually doing an upgrade, and we have a workload running on the cluster. |
Eh, thinking about this more it's not a hard dependency - we can just make changes to |
Ah right, makes sense to me now. thanks for the explanation. |
The machine-config operator had a bug where MachineConfig entries lead the machine-config daemon (MCD) to lay down a storage.conf that exactly matched the content installed by the containers-common RPM. On update, the RHCOS machine pivots to a new OSTree image (defined in the machine-os-content image referenced from the release image). Seeing storage.conf content that matched the old OSTree image, libostree replaced storage.conf with the version defined in the new OSTree image [1]. Then, when the MCD comes back up post-pivot, it sees the divergent storage.conf content and freaks out with logs like [2]: E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf: and the machine-config operator goes Degraded=True with RequiredPoolsFailed "nodes are reporting degraded status on sync" [3]. The narrow machine-config fix was to annotate storage.conf that it writes, libostree doesn't touch the files on pivot [4]. This addresses the storage.conf case, but leaves the MCD vulnerable to other instances of "MCD writes exactly the OSTree contents to $FILE and expects it to remain untouched during an OSTree pivot that bumps the file". I'm not aware of a generic fix at the moment, although [5] might be related. You can guard a cluster against the narrow bug by setting a MachineConfig [6] or higher level object such as a ContainerRuntimeConfig [7] that will cause the MCD to write a storage.conf that diverges (even just by a comment or whitespace) from the OSTree original. Tracking the narrow fix through the various z streams: The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed before 4.1.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e $ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7 d2c44d7c Merge pull request openshift#330 from umohnani8/runtime The 4.1 machine-config fix was [9], landed in 1301934 [10], which is new in 4.1.34: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 $ git --no-pager log --oneline --first-parent -2 f56d736e74a f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1 1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new in 4.2.18: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9 $ git --no-pager log --oneline --first-parent -2 9366460085 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which landed early enough for 4.3.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ git --no-pager log --oneline --first-parent -8 23a6e6fb37 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources 269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3 fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3 787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3 2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3 9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3 The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs have been cut. Even in 4.4, the generated note was the first content touch to this template: $ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml 46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note 47a6321c templates: Move container-storage.yaml into common/ 74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller (47a6321c was a pure rename). So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and all 4.3 and later releases. When has the RPM-installed storage.conf changed? Figuring this part out is a bit awkward, because we need to drill down machine-os-content -> RHCOS -> RPM -> file. For example, from 4.2.16 -> 4.2.18 [14]: $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version 42.81.20200114.0 $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version 42.81.20200203.1 $ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]' cri-o ignition libarchive machine-config-daemon openshift-clients openshift-hyperkube sqlite-libs storage.conf is managed by the containers-common RPM, so no change from 4.2.16 to 4.2.18, and that update will safely pull in the fixed MCD without a surprising pivot change. Here are our changes to the RPM across the various z streams: $ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 410.8.20190606.0 0.1.32 4.1.1 410.8.20191030.0 0.1.32 4.1.23 410.81.20191112.2 0.1.37 4.1.24 410.81.20200114.0 0.1.37 4.1.31-x86_64 410.81.20200204.1 0.1.40 4.1.34-x86_64 $ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 42.80.20190930.1 0.1.32 4.2.0-rc.0 42.80.20191022.0 0.1.32 4.2.2 42.81.20191107.0 0.1.37 4.2.4 42.81.20200114.0 0.1.37 4.2.16-x86_64 42.81.20200203.1 0.1.37 4.2.18-x86_64 42.81.20200210.0 0.1.40 4.2.19-x86_64 $ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64 43.81.202002170853.0 0.1.40 4.3.3-x86_64 Fetching a source RPM for containers-common, e.g. from [15,16] shows the source packages coming from skopeo. Checking [17]: $ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2 vendor/github.com/containers/storage/storage.conf | 3 --- 1 file changed, 3 deletions(-) 39ff039b Image encryption/decryption support in skopeo vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) 05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4 vendor/github.com/containers/storage/storage.conf | 7 ------- 1 file changed, 7 deletions(-) 700b3102 update github.com/containers/{image,storage} vendor/github.com/containers/storage/storage.conf | 8 ++++++++ 1 file changed, 8 insertions(+) 033b2902 migrate to go modules vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) $ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf fe259105 add storage.conf and manpage in contrib/ contrib/storage.conf | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) $ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done v0.1.29~3^2 v0.1.38~14^2~2 v0.1.39~1 v0.1.41~25^2 v0.1.41~21^2 v0.1.41~12^2 So changes may have been made in 0.1.29 (when the file landed for the first time, likely from wherever we store post-Git patches), and were likely made in 0.1.38, 0.1.39, and 0.1.41. Comparing with our machine-os-content, that means vulnerable transitions are: * 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. There may be no safe way to get to 4.1.34. * 4.1.* -> 4.2... FIXME * 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is fine, because there were no RPM-induced storage.conf bumps. 4.2.18 -> 4.2.* is fine, because 4.2.18 has the patched machine-config source. * 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because 4.2.18 has the patched machine-config source. * 4.3 -> 4.3 are fine, since they all have the patched machine-config source. So ideally this pull would block edges from 4.2.16 and earlier into 4.3. But because blocked-edges requires explicit to, I've just added the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.* or only give 4.2.18+ as update sources). I've also dropped 4.2.16 from the *-4.3 channels with a comment about this bug. There shouldn't be much pushback on pulling the edge, because users can still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2. Also simplify the wording on the GCP bug 1793635, which remains unfixed. [1]: openshift/machine-config-operator#1320 (comment) [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0 [4]: https://github.com/openshift/machine-config-operator/pull/1320/files [5]: openshift/machine-config-operator#1190 [6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md [7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md [8]: openshift/machine-config-operator#330 (comment) [9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153 [10]: openshift/machine-config-operator#1382 (comment) [11]: openshift/machine-config-operator#1323 (comment) [12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149 [13]: openshift/machine-config-operator#1322 (comment) [14]: https://gitlab.cee.redhat.com/coretools/differ Internal link, sorry :/ But you can also browse the history at: https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc. [15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages [16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package [17]: https://github.com/containers/skopeo/
The machine-config operator had a bug where MachineConfig entries lead the machine-config daemon (MCD) to lay down a storage.conf that exactly matched the content installed by the containers-common RPM. On update, the RHCOS machine pivots to a new OSTree image (defined in the machine-os-content image referenced from the release image). Seeing storage.conf content that matched the old OSTree image, libostree replaced storage.conf with the version defined in the new OSTree image [1]. Then, when the MCD comes back up post-pivot, it sees the divergent storage.conf content and freaks out with logs like [2]: E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf: and the machine-config operator goes Degraded=True with RequiredPoolsFailed "nodes are reporting degraded status on sync" [3]. The narrow machine-config fix was to annotate storage.conf that it writes, libostree doesn't touch the files on pivot [4]. This addresses the storage.conf case, but leaves the MCD vulnerable to other instances of "MCD writes exactly the OSTree contents to $FILE and expects it to remain untouched during an OSTree pivot that bumps the file". I'm not aware of a generic fix at the moment, although [5] might be related. You can guard a cluster against the narrow bug by setting a MachineConfig [6] or higher level object such as a ContainerRuntimeConfig [7] that will cause the MCD to write a storage.conf that diverges (even just by a comment or whitespace) from the OSTree original. Tracking the narrow fix through the various z streams: The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed before 4.1.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e $ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7 d2c44d7c Merge pull request openshift#330 from umohnani8/runtime The 4.1 machine-config fix was [9], landed in 1301934 [10], which is new in 4.1.34: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 $ git --no-pager log --oneline --first-parent -2 f56d736e74a f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1 1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new in 4.2.18: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9 $ git --no-pager log --oneline --first-parent -2 9366460085 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which landed early enough for 4.3.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ git --no-pager log --oneline --first-parent -8 23a6e6fb37 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources 269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3 fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3 787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3 2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3 9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3 The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs have been cut. Even in 4.4, the generated note was the first content touch to this template: $ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml 46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note 47a6321c templates: Move container-storage.yaml into common/ 74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller (47a6321c was a pure rename). So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and all 4.3 and later releases. When has the RPM-installed storage.conf changed? Figuring this part out is a bit awkward, because we need to drill down machine-os-content -> RHCOS -> RPM -> file. For example, from 4.2.16 -> 4.2.18 [14]: $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version 42.81.20200114.0 $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version 42.81.20200203.1 $ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]' cri-o ignition libarchive machine-config-daemon openshift-clients openshift-hyperkube sqlite-libs storage.conf is managed by the containers-common RPM, so no change from 4.2.16 to 4.2.18, and that update will safely pull in the fixed MCD without a surprising pivot change. Here are our changes to the RPM across the various z streams: $ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 410.8.20190606.0 0.1.32 4.1.1 410.8.20191030.0 0.1.32 4.1.23 410.81.20191112.2 0.1.37 4.1.24 410.81.20200114.0 0.1.37 4.1.31-x86_64 410.81.20200204.1 0.1.40 4.1.34-x86_64 $ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 42.80.20190930.1 0.1.32 4.2.0-rc.0 42.80.20191022.0 0.1.32 4.2.2 42.81.20191107.0 0.1.37 4.2.4 42.81.20200114.0 0.1.37 4.2.16-x86_64 42.81.20200203.1 0.1.37 4.2.18-x86_64 42.81.20200210.0 0.1.40 4.2.19-x86_64 $ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64 43.81.202002170853.0 0.1.40 4.3.3-x86_64 Fetching a source RPM for containers-common, e.g. from [15,16] shows the source packages coming from skopeo. Checking [17]: $ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2 vendor/github.com/containers/storage/storage.conf | 3 --- 1 file changed, 3 deletions(-) 39ff039b Image encryption/decryption support in skopeo vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) 05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4 vendor/github.com/containers/storage/storage.conf | 7 ------- 1 file changed, 7 deletions(-) 700b3102 update github.com/containers/{image,storage} vendor/github.com/containers/storage/storage.conf | 8 ++++++++ 1 file changed, 8 insertions(+) 033b2902 migrate to go modules vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) $ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf fe259105 add storage.conf and manpage in contrib/ contrib/storage.conf | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) $ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done v0.1.29~3^2 v0.1.38~14^2~2 v0.1.39~1 v0.1.41~25^2 v0.1.41~21^2 v0.1.41~12^2 So changes may have been made in 0.1.29 (when the file landed for the first time, likely from wherever we store post-Git patches), and were likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and derivative containers-common RPMs may have had patched versions of the file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with the machine-config template: $ git -C containers/skopeo remote -v | grep 'dist-git.*fetch' dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch) $ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf 2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config 2017-11-08 284f9024 Force storage.conf to default to overlay $ git --no-pager -C containers/skopeo grep '^Version:' 3757b210 3757b210:skopeo.spec:Version: 0.1.31 $ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800 +++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800 @@ -1,3 +1,10 @@ +filesystem: "root" +mode: 0644 +path: "/etc/containers/storage.conf" +contents: + inline: | +# This file is generated by the Machine Config Operator's containerruntimeconfig controller. +# # storage.conf is the configuration file for all tools # that share the containers/storage libraries # See man 5 containers-storage.conf for more information So the machine-config master (5ed0aee72c) only differs from the old 0.1.31 RPM storage.conf by the "file is generated" marker. There does not seem to be any 4.2-specific content. Presumably they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes: $ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf 2019-12-09 4a131916 skopeo-0.1.40-2.el8 storage.conf | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) 2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) So it looks like we can ignore the dev skopeo repository, focus on the dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a version of storage.conf in the RPMs that matched the unpatched machine-config templates, and with 0.1.40-2.el8 and later the RPMs had different content. Sanity checking via [19,20]: $ diff -U3 <(rpm2cpio containers-common-0.1.32-5.git1715c90.el8.x86_64.rpm | cpio -i --to-stdout ./etc/containers/storage.conf 2>/dev/null) <(sed 's/^ //' templates/common/_base/files/container-storage.yaml) --- /dev/fd/63 2020-02-20 01:36:23.031918968 -0800 +++ /dev/fd/62 2020-02-20 01:36:23.031918968 -0800 @@ -1,3 +1,10 @@ +filesystem: "root" +mode: 0644 +path: "/etc/containers/storage.conf" +contents: + inline: | +# This file is generated by the Machine Config Operator's containerruntimeconfig controller. +# # storage.conf is the configuration file for all tools # that share the containers/storage libraries # See man 5 containers-storage.conf for more information but I'm not clear on why the product pages are claiming containers-common-0.1.32 for 4.1.34 [19,20]. FIXME Comparing with our machine-os-content, that means vulnerable transitions are: * 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. There may be no safe way to get to 4.1.34. * 4.1.* -> 4.2... FIXME * 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is fine, because there were no RPM-induced storage.conf bumps. 4.2.18 -> 4.2.* is fine, because 4.2.18 has the patched machine-config source. * 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because 4.2.18 has the patched machine-config source. * 4.3 -> 4.3 are fine, since they all have the patched machine-config source. So ideally this pull would block edges from 4.2.16 and earlier into 4.3. But because blocked-edges requires explicit to, I've just added the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.* or only give 4.2.18+ as update sources). I've also dropped 4.2.16 from the *-4.3 channels with a comment about this bug. There shouldn't be much pushback on pulling the edge, because users can still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2. Also simplify the wording on the GCP bug 1793635, which remains unfixed. [1]: openshift/machine-config-operator#1320 (comment) [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0 [4]: https://github.com/openshift/machine-config-operator/pull/1320/files [5]: openshift/machine-config-operator#1190 [6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md [7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md [8]: openshift/machine-config-operator#330 (comment) [9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153 [10]: openshift/machine-config-operator#1382 (comment) [11]: openshift/machine-config-operator#1323 (comment) [12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149 [13]: openshift/machine-config-operator#1322 (comment) [14]: https://gitlab.cee.redhat.com/coretools/differ Internal link, sorry :/ But you can also browse the history at: https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc. [15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages [16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package [17]: https://github.com/containers/skopeo/ [18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/ [19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages [20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
The machine-config operator had a bug where MachineConfig entries lead the machine-config daemon (MCD) to lay down a storage.conf that exactly matched the content installed by the containers-common RPM. On update, the RHCOS machine pivots to a new OSTree image (defined in the machine-os-content image referenced from the release image). Seeing storage.conf content that matched the old OSTree image, libostree replaced storage.conf with the version defined in the new OSTree image [1]. Then, when the MCD comes back up post-pivot, it sees the divergent storage.conf content and freaks out with logs like [2]: E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf: and the machine-config operator goes Degraded=True with RequiredPoolsFailed "nodes are reporting degraded status on sync" [3]. The narrow machine-config fix was to annotate storage.conf that it writes, libostree doesn't touch the files on pivot [4]. This addresses the storage.conf case, but leaves the MCD vulnerable to other instances of "MCD writes exactly the OSTree contents to $FILE and expects it to remain untouched during an OSTree pivot that bumps the file". I'm not aware of a generic fix at the moment, although [5] might be related. You can guard a cluster against the narrow bug by setting a MachineConfig [6] or higher level object such as a ContainerRuntimeConfig [7] that will cause the MCD to write a storage.conf that diverges (even just by a comment or whitespace) from the OSTree original. Tracking the narrow fix through the various z streams: The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed before 4.1.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e $ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7 d2c44d7c Merge pull request openshift#330 from umohnani8/runtime The 4.1 machine-config fix was [9], landed in 1301934 [10], which is new in 4.1.34: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 $ git --no-pager log --oneline --first-parent -2 f56d736e74a f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1 1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new in 4.2.18: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9 $ git --no-pager log --oneline --first-parent -2 9366460085 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which landed early enough for 4.3.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ git --no-pager log --oneline --first-parent -8 23a6e6fb37 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources 269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3 fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3 787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3 2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3 9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3 The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs have been cut. Even in 4.4, the generated note was the first content touch to this template: $ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml 46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note 47a6321c templates: Move container-storage.yaml into common/ 74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller (47a6321c was a pure rename). So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and all 4.3 and later releases. When has the RPM-installed storage.conf changed? Figuring this part out is a bit awkward, because we need to drill down machine-os-content -> RHCOS -> RPM -> file. For example, from 4.2.16 -> 4.2.18 [14]: $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version 42.81.20200114.0 $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version 42.81.20200203.1 $ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]' cri-o ignition libarchive machine-config-daemon openshift-clients openshift-hyperkube sqlite-libs storage.conf is managed by the containers-common RPM, so no change from 4.2.16 to 4.2.18, and that update will safely pull in the fixed MCD without a surprising pivot change. Here are our changes to the RPM across the various z streams: $ for OCP in 4.1.1 4.1.23 4.1.24 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 410.8.20190606.0 0.1.32 4.1.1 410.8.20191030.0 0.1.32 4.1.23 410.81.20191112.2 0.1.37 4.1.24 410.81.20200114.0 0.1.37 4.1.31-x86_64 410.81.20200204.1 0.1.40 4.1.34-x86_64 $ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.16-x86_64 4.2.18-x86_64 4.2.19-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 42.80.20190930.1 0.1.32 4.2.0-rc.0 42.80.20191022.0 0.1.32 4.2.2 42.81.20191107.0 0.1.37 4.2.4 42.81.20200114.0 0.1.37 4.2.16-x86_64 42.81.20200203.1 0.1.37 4.2.18-x86_64 42.81.20200210.0 0.1.40 4.2.19-x86_64 $ for OCP in 4.3.0-rc.0-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -r '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .[2]')"; echo "${RHCOS} ${COMMON} ${OCP}"; done 43.81.202001072253.0 0.1.40 4.3.0-rc.0-x86_64 43.81.202002170853.0 0.1.40 4.3.3-x86_64 Fetching a source RPM for containers-common, e.g. from [15,16] shows the source packages coming from skopeo. Checking [17]: $ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2 vendor/github.com/containers/storage/storage.conf | 3 --- 1 file changed, 3 deletions(-) 39ff039b Image encryption/decryption support in skopeo vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) 05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4 vendor/github.com/containers/storage/storage.conf | 7 ------- 1 file changed, 7 deletions(-) 700b3102 update github.com/containers/{image,storage} vendor/github.com/containers/storage/storage.conf | 8 ++++++++ 1 file changed, 8 insertions(+) 033b2902 migrate to go modules vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) $ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf fe259105 add storage.conf and manpage in contrib/ contrib/storage.conf | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) $ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done v0.1.29~3^2 v0.1.38~14^2~2 v0.1.39~1 v0.1.41~25^2 v0.1.41~21^2 v0.1.41~12^2 So changes may have been made in 0.1.29 (when the file landed for the first time, likely from wherever we store post-Git patches), and were likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and derivative containers-common RPMs may have had patched versions of the file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with the machine-config template: $ git -C containers/skopeo remote -v | grep 'dist-git.*fetch' dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch) $ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf 2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config 2017-11-08 284f9024 Force storage.conf to default to overlay $ git --no-pager -C containers/skopeo grep '^Version:' 3757b210 3757b210:skopeo.spec:Version: 0.1.31 $ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800 +++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800 @@ -1,3 +1,10 @@ +filesystem: "root" +mode: 0644 +path: "/etc/containers/storage.conf" +contents: + inline: | +# This file is generated by the Machine Config Operator's containerruntimeconfig controller. +# # storage.conf is the configuration file for all tools # that share the containers/storage libraries # See man 5 containers-storage.conf for more information So the machine-config master (5ed0aee72c) only differs from the old 0.1.31 RPM storage.conf by the "file is generated" marker. There does not seem to be any 4.2-specific content. Presumably they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes: $ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf 2019-12-09 4a131916 skopeo-0.1.40-2.el8 storage.conf | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) 2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) So it looks like we can ignore the dev skopeo repository, focus on the dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a version of storage.conf in the RPMs that matched the unpatched machine-config templates, and with 0.1.40-2.el8 and later the RPMs had different content. Sanity checking via [19,20]: $ diff -U3 <(rpm2cpio containers-common-0.1.32-5.git1715c90.el8.x86_64.rpm | cpio -i --to-stdout ./etc/containers/storage.conf 2>/dev/null) <(sed 's/^ //' templates/common/_base/files/container-storage.yaml) --- /dev/fd/63 2020-02-20 01:36:23.031918968 -0800 +++ /dev/fd/62 2020-02-20 01:36:23.031918968 -0800 @@ -1,3 +1,10 @@ +filesystem: "root" +mode: 0644 +path: "/etc/containers/storage.conf" +contents: + inline: | +# This file is generated by the Machine Config Operator's containerruntimeconfig controller. +# # storage.conf is the configuration file for all tools # that share the containers/storage libraries # See man 5 containers-storage.conf for more information but I'm not clear on why the product pages are claiming containers-common-0.1.32 for 4.1.34 [19,20]. FIXME Comparing with our machine-os-content, that means vulnerable transitions are: * 4.1.* -> 4.1.34, since 4.1.31 -> 4.1.34 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. There may be no safe way to get to 4.1.34. * 4.1.* -> 4.2... FIXME * 4.2.16 and earler -> 4.2.19, since 4.2.18 -> 4.2.19 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.16 and earlier -> 4.2.18 is fine, because there were no RPM-induced storage.conf bumps. 4.2.18 -> 4.2.* is fine, because 4.2.18 has the patched machine-config source. * 4.2.16 and earlier -> 4.3, since 4.2.18 -> 4.3 takes containers-common from 0.1.37 to 0.1.40, picking up the v0.1.38~14^2~2 and v0.1.39~1 bumps. 4.2.18 -> 4.3 is fine, because 4.2.18 has the patched machine-config source. * 4.3 -> 4.3 are fine, since they all have the patched machine-config source. So ideally this pull would block edges from 4.2.16 and earlier into 4.3. But because blocked-edges requires explicit to, I've just added the 4.3.0 blocker (other 4.3.z releases either already blocked 4.2.* or only give 4.2.18+ as update sources). I've also dropped 4.2.16 from the *-4.3 channels with a comment about this bug. There shouldn't be much pushback on pulling the edge, because users can still move from 4.2 to 4.3 via 4.2.19 -> 4.3.2. Also simplify the wording on the GCP bug 1793635, which remains unfixed. [1]: openshift/machine-config-operator#1320 (comment) [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0 [4]: https://github.com/openshift/machine-config-operator/pull/1320/files [5]: openshift/machine-config-operator#1190 [6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md [7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md [8]: openshift/machine-config-operator#330 (comment) [9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153 [10]: openshift/machine-config-operator#1382 (comment) [11]: openshift/machine-config-operator#1323 (comment) [12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149 [13]: openshift/machine-config-operator#1322 (comment) [14]: https://gitlab.cee.redhat.com/coretools/differ Internal link, sorry :/ But you can also browse the history at: https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc. [15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages [16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package [17]: https://github.com/containers/skopeo/ [18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/ [19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages [20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package
The machine-config operator had a bug where MachineConfig entries lead the machine-config daemon (MCD) to lay down a storage.conf that exactly matched the content installed by the containers-common RPM. On update, the RHCOS machine pivots to a new OSTree image (defined in the machine-os-content image referenced from the release image). Seeing storage.conf content that matched the old OSTree image, libostree replaced storage.conf with the version defined in the new OSTree image [1]. Then, when the MCD comes back up post-pivot, it sees the divergent storage.conf content and freaks out with logs like [2]: E1210 16:15:51.105286 11181 daemon.go:1350] content mismatch for file /etc/containers/storage.conf: and the machine-config operator goes Degraded=True with RequiredPoolsFailed "nodes are reporting degraded status on sync" [3]. The narrow machine-config fix was to annotate storage.conf that it writes, libostree doesn't touch the files on pivot [4]. This addresses the storage.conf case, but leaves the MCD vulnerable to other instances of "MCD writes exactly the OSTree contents to $FILE and expects it to remain untouched during an OSTree pivot that bumps the file". I'm not aware of a generic fix at the moment, although [5] might be related. You can guard a cluster against the narrow bug by setting a MachineConfig [6] or higher level object such as a ContainerRuntimeConfig [7] that will cause the MCD to write a storage.conf that diverges (even just by a comment or whitespace) from the OSTree original. Tracking the narrow fix through the various z streams: The 4.1 machine-config bug was introduced in d2c44d7 [8], which landed before 4.1.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.0-rc.0 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-daemon https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-operator https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e machine-config-server https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e setup-etcd-environment https://github.com/openshift/machine-config-operator de9998eb37e90b3ee2fcfdbb3eda7ba26870ab6e $ git --no-pager log --oneline --first-parent de9998eb37 | grep d2c44d7 d2c44d7c Merge pull request openshift#330 from umohnani8/runtime The 4.1 machine-config fix was [9], landed in 1301934 [10], which is new in 4.1.34: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.34-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-daemon https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-operator https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b machine-config-server https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b setup-etcd-environment https://github.com/openshift/machine-config-operator f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.1.31-x86_64 | grep machine-config machine-config-controller https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-daemon https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-operator https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 machine-config-server https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 setup-etcd-environment https://github.com/openshift/machine-config-operator b38afe6e5b79a3e11e881429dc4c7c70e8784e84 $ git --no-pager log --oneline --first-parent -2 f56d736e74a f56d736e (origin/release-4.1) Merge pull request openshift#1147 from openshift-cherrypick-robot/cherry-pick-1114-to-release-4.1 1301934a Merge pull request openshift#1382 from vrutkovs/4.1-containers-conf-generated The 4.2 machine-config fix was [2], landed in bd358bb [11], which is new in 4.2.18: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 9366460085b2a24d825380759f554769ec5ab4f9 $ git --no-pager log --oneline --first-parent -2 9366460085 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 The 4.3 machine-config fix was [12], landed in 9fd53bd [13], which landed early enough for 4.3.0-rc.0: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ git --no-pager log --oneline --first-parent -8 23a6e6fb37 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 80c8aed7 Merge pull request openshift#1343 from retroflexer/cherry-pick-backup-restore-kube-static-resources 269990a3 Merge pull request openshift#1344 from openshift-cherrypick-robot/cherry-pick-1296-to-release-4.3 fd3ca395 Merge pull request openshift#1338 from runcom/fix-go-mod ba304dbb Merge pull request openshift#1333 from openshift-cherrypick-robot/cherry-pick-1278-to-release-4.3 787f3fa9 Merge pull request openshift#1332 from runcom/reserved-cpus-4.3 2b85d6ba Merge pull request openshift#1329 from openshift-cherrypick-robot/cherry-pick-1314-to-release-4.3 9fd53bd5 Merge pull request openshift#1322 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.3 The 4.4 machine-config fix was [3] which has landed before any 4.4 RCs have been cut. Even in 4.4, the generated note was the first content touch to this template: $ git --no-pager log --oneline --follow origin/release-4.4 -- templates/common/_base/files/container-storage.yaml 46c4e27a (origin/pr/1320) templates/container-storage: Add a "this is generated" note 47a6321c templates: Move container-storage.yaml into common/ 74ae3b31 (origin/pr/330) Add ContainerRuntime CRD and Controller (47a6321c was a pure rename). So the MCD has been annotating storage.conf since 4.1.34, 4.2.18, and all 4.3 and later releases. When has the RPM-installed storage.conf changed? Figuring this part out is a bit awkward, because we need to drill down machine-os-content -> RHCOS -> RPM -> file. For example, from 4.2.16 -> 4.2.18 [14]: $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.16-x86_64) | jq -r .config.config.Labels.version 42.81.20200114.0 $ oc image info --output json $(oc adm release info --image-for=machine-os-content quay.io/openshift-release-dev/ocp-release:4.2.18-x86_64) | jq -r .config.config.Labels.version 42.81.20200203.1 $ ./differ.py --first-endpoint art --first-version 42.81.20200114.0 --second-endpoint art --second-version 42.81.20200203.1 | jq -r '.diff | keys | sort[]' cri-o ignition libarchive machine-config-daemon openshift-clients openshift-hyperkube sqlite-libs storage.conf is managed by the containers-common RPM, so no change from 4.2.16 to 4.2.18, and that update will safely pull in the fixed MCD without a surprising pivot change. Here are our changes to the RPM across the various z streams: $ for OCP in 4.1.1 4.1.16 4.1.17 4.1.23 4.1.24 4.1.28 4.1.29 4.1.31-x86_64 4.1.34-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.1/${RHCOS}/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done ["containers-common","1","0.1.32","4.git1715c90.el8","x86_64"] 410.8.20190606.0 4.1.1 ["containers-common","1","0.1.32","4.git1715c90.el8","x86_64"] 410.8.20190910.1 4.1.16 ["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 410.8.20190918.0 4.1.17 ["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 410.8.20191030.0 4.1.23 ["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 410.81.20191112.2 4.1.24 ["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 410.81.20191210.0 4.1.28 ["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 410.81.20191223.0 4.1.29 ["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 410.81.20200114.0 4.1.31-x86_64 ["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 410.81.20200204.1 4.1.34-x86_64 $ for OCP in 4.2.0-rc.0 4.2.2 4.2.4 4.2.12 4.2.13 4.2.18-x86_64 4.2.19-x86_64 4.2.20-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.2/${RHCOS}/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done ["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 42.80.20190930.1 4.2.0-rc.0 ["containers-common","1","0.1.32","5.git1715c90.el8","x86_64"] 42.80.20191022.0 4.2.2 ["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 42.81.20191107.0 4.2.4 ["containers-common","1","0.1.37","5.module+el8.1.0+4240+893c1ab8","x86_64"] 42.81.20191210.1 4.2.12 ["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 42.81.20191223.0 4.2.13 ["containers-common","1","0.1.37","6.module+el8.1.0+4876+e678a192","x86_64"] 42.81.20200203.1 4.2.18-x86_64 ["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 42.81.20200210.0 4.2.19-x86_64 ["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 42.81.20200217.0 4.2.20-x86_64 $ for OCP in 4.3.0-rc.0-x86_64 4.3.0-x86_64 4.3.1-x86_64 4.3.2-x86_64 4.3.3-x86_64; do RHCOS="$(oc image info --output json $(oc adm release info --image-for=machine-os-content "quay.io/openshift-release-dev/ocp-release:${OCP}") | jq -r .config.config.Labels.version)"; COMMON="$(curl -s "https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/${RHCOS}/x86_64/commitmeta.json" | jq -c '.["rpmostree.rpmdb.pkglist"][] | select(.[0] == "containers-common") | .')"; echo "${COMMON} ${RHCOS} ${OCP}"; done ["containers-common","1","0.1.40","2.el8","x86_64"] 43.81.202001072253.0 4.3.0-rc.0-x86_64 ["containers-common","1","0.1.40","2.el8","x86_64"] 43.81.202001142154.0 4.3.0-x86_64 ["containers-common","1","0.1.40","3.rhaos.el8","x86_64"] 43.81.202002032142.0 4.3.1-x86_64 ["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 43.81.202002110953.0 4.3.2-x86_64 ["containers-common","1","0.1.40","8.module+el8.1.1+5351+506397b0","x86_64"] 43.81.202002170853.0 4.3.3-x86_64 Fetching a source RPM for containers-common, e.g. from [15,16] shows the source packages coming from skopeo. Checking [17]: $ git --no-pager log --follow --oneline --stat=200 -M50% -- vendor/github.com/containers/storage/storage.conf afaa9e7f Bump github.com/containers/storage from 1.15.1 to 1.15.2 vendor/github.com/containers/storage/storage.conf | 3 --- 1 file changed, 3 deletions(-) 39ff039b Image encryption/decryption support in skopeo vendor/github.com/containers/storage/storage.conf | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) 05ae513b Bump github.com/containers/buildah from 1.8.4 to 1.11.4 vendor/github.com/containers/storage/storage.conf | 7 ------- 1 file changed, 7 deletions(-) 700b3102 update github.com/containers/{image,storage} vendor/github.com/containers/storage/storage.conf | 8 ++++++++ 1 file changed, 8 insertions(+) 033b2902 migrate to go modules vendor/github.com/containers/storage/storage.conf | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) $ git --no-pager log --follow --oneline --stat=200 -M50% 033b2902^ -- contrib/storage.conf fe259105 add storage.conf and manpage in contrib/ contrib/storage.conf | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) $ for HASH in fe259105 033b2902 700b3102 05ae513b 39ff039b afaa9e7f; do git describe --contains "${HASH}"; done v0.1.29~3^2 v0.1.38~14^2~2 v0.1.39~1 v0.1.41~25^2 v0.1.41~21^2 v0.1.41~12^2 So changes may have been made in 0.1.29 (when the file landed for the first time, likely from wherever we store post-Git patches), and were likely made in 0.1.38, 0.1.39, and 0.1.41. However, the skopeo and derivative containers-common RPMs may have had patched versions of the file tracked in dist-git [18]. Comparing the dist-git 4.1 tip with the machine-config template: $ git -C containers/skopeo remote -v | grep 'dist-git.*fetch' dist-git git://pkgs.devel.redhat.com/rpms/skopeo.git (fetch) $ git --no-pager -C containers/skopeo log --date=short --format='%ad %h %s' -2 dist-git/rhaos-4.1-rhel-8 -- storage.conf 2018-07-18 3757b210 add statx to seccomp.json to containers-config add seccomp.json to containers-config 2017-11-08 284f9024 Force storage.conf to default to overlay $ git --no-pager -C containers/skopeo grep '^Version:' 3757b210 3757b210:skopeo.spec:Version: 0.1.31 $ diff -U3 <(git -C containers/skopeo cat-file -p 3757b210:storage.conf) <(sed 's/^ //' openshift/machine-config-operator/templates/common/_base/files/container-storage.yaml)--- /dev/fd/63 2020-02-20 01:13:48.073704685 -0800 +++ /dev/fd/62 2020-02-20 01:13:48.073704685 -0800 @@ -1,3 +1,10 @@ +filesystem: "root" +mode: 0644 +path: "/etc/containers/storage.conf" +contents: + inline: | +# This file is generated by the Machine Config Operator's containerruntimeconfig controller. +# # storage.conf is the configuration file for all tools # that share the containers/storage libraries # See man 5 containers-storage.conf for more information So the machine-config master (5ed0aee72c) only differs from the old 0.1.31 RPM storage.conf by the "file is generated" marker. There does not seem to be any 4.2-specific content. Presumably they're using the same rhaos-4.1-rhel-8 RPMs. 4.3 has some changes: $ git --no-pager log --date=short --format='%ad %h %s' -2 --stat=80 dist-git/rhaos-4.3-rhel-8 -- storage.conf 2019-12-09 4a131916 skopeo-0.1.40-2.el8 storage.conf | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) 2019-10-08 13a4ce10 skopeo-1:0.1.40-0.1.gitf72e39f storage.conf | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) So it looks like we can ignore the dev skopeo repository, focus on the dist-git skopeo repository, and say that before 0.1.40-2.el8 we had a version of storage.conf in the RPMs that matched the unpatched machine-config templates, and with 0.1.40-2.el8 and later the RPMs had different content. Can we check the RPMs to confirm? The product pages are claiming containers-common-0.1.32 for 4.1.34 [19,20]. Those product pages are fed from RPM Errata reports, and ART builds those Errata by sweeping RPM repositories in the viscinity of the RHCOS builds. So there's a potential for races like: 1. RPM Errata sweep fires and grabs RPM A v1. 2. New RPM A v2 pushed to the repository. 3. RHCOS build hits repositories and grabs RPM A v2. The RPMs referenced by releases-rhcos-art.cloud are reliable, but actually tracking down the referenced RPMs to download them is complicated (especially for module builds like containers-common). But here are two RPM-lookup procedures that seem more reliable: A. From [21]: 1. On [21], find the matching skopeo package, e.g. skopeo-0.1.40-2.el8. Click through to the Advisory, e.g. [22]. 2. On [22], find the matching skopeo package, expand the CDN RPMs section to see the containers-common RPM link, e.g. [23]. 3. Click through to /etc/containers/storage.conf, e.g. [24]. 4. See the sha256, e.g. a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044. B. From [25]. This works better for module builds. 1. Search for the skopeo package from [25], e.g. [26], takes me to [27]. 2. Find the matching package, e.g. skopeo-0.1.37-5.module+el8.1.0+4240+893c1ab8, and click through to [28]. 3. Find the x86_64 containers-common RPM, and click through to info [29]. Continue from step A.3. Summarizing storage.conf digests for the various RPMs: * containers-common-1:0.1.32-4.git1715c90.el8.x86_64 Used for 4.1.1 through 4.1.16. ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [30] * containers-common-1:0.1.32-5.git1715c90.el8.x86_64 Used for 4.1.7 through 4.1.23, 4.2.0-rc.0 through 4.2.2. ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [31] * containers-common-1:0.1.37-5.module+el8.1.0+4240+893c1ab8.x86_64 Used for 4.1.24 through 4.1.28, 4.2.4 through 4.2.12. ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [32] * containers-common-1:0.1.37-6.module+el8.1.0+4876+e678a192.x86_64 Used for 4.1.29 through 4.1.31, 4.2.13 through 4.2.18. ee7daca89532d5a80da391fc358776ec11eff256c497652c49505acc70b96822 [33] * containers-common-1:0.1.40-2.el8.x86_64.rpm Used for 4.3.0-rc.0 through 4.3.0. a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [24] * containers-common-1:0.1.40-3.rhaos.el8.x86_64 Used for 4.3.1. a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [34] * containers-common-1:0.1.40-8.module+el8.1.1+5351+506397b0.x86_64 Used for 4.2.19, 4.2.20, and 4.1.34. a6423cca39d0cde0d6ee82163630d288e8876ab7d39d2678f6d86d804bf61044 [35] So there are only two versions in the RPMs, ee7daca895 used for all 4.1 and 4.2, and a6423cca39 used for all 4.3. That means that the vulnerable transitions are 4.2.16 and earlier going into 4.3. It also means that there's a potential for future trouble in transitions from 4.1.31 and earlier to a future 4.1 or 4.2 where the RPM-installed content is different, and from 4.2.16 and earlier to a future 4.2 where the RPM-installed content is different, but that we have no such 4.1 or 4.2 changes at the moment. So ideally this pull would block edges from 4.2.16 and earlier into 4.3. This commit drops 4.2.16 from the *-4.3 channels with a comment about this bug. This also explicitly blocks edges from 4.2 into 4.3.0, because 4.3.0 is the only 4.3 release which recommends 4.2.16 or earlier as an update edge. $ for i in $(seq 0 3); do echo -n "$i "; oc adm release info "quay.io/openshift-release-dev/ocp-release:4.3.$i-x86_64" | grep Upgrades; done 0 Upgrades: 4.2.16, 4.3.0-rc.0, 4.3.0-rc.1, 4.3.0-rc.2, 4.3.0-rc.3 1 Upgrades: 4.2.18, 4.3.0-rc.0, 4.3.0-rc.3, 4.3.0 2 Upgrades: 4.2.19, 4.3.0, 4.3.1 3 Upgrades: 4.2.20, 4.3.0, 4.3.1, 4.3.2 There shouldn't be much pushback on pulling the edge, because users can still move from 4.2 to 4.3 via 4.2.18 -> 4.3.1, both of which are already in fast-4.3. Also simplify the wording on the GCP bug 1793635, which remains unfixed. [1]: openshift/machine-config-operator#1320 (comment) [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1782152#c5 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1781708#c0 [4]: https://github.com/openshift/machine-config-operator/pull/1320/files [5]: openshift/machine-config-operator#1190 [6]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/MachineConfiguration.md [7]: https://github.com/openshift/machine-config-operator/blob/13f0dda734262c3edbd23c007e42b7704125e88f/docs/ContainerRuntimeConfigDesign.md [8]: openshift/machine-config-operator#330 (comment) [9]: https://bugzilla.redhat.com/show_bug.cgi?id=1782153 [10]: openshift/machine-config-operator#1382 (comment) [11]: openshift/machine-config-operator#1323 (comment) [12]: https://bugzilla.redhat.com/show_bug.cgi?id=1782149 [13]: openshift/machine-config-operator#1322 (comment) [14]: https://gitlab.cee.redhat.com/coretools/differ Internal link, sorry :/ But you can also browse the history at: https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.2&release=42.81.20200114.0 etc. [15]: https://access.redhat.com/downloads/content/290/ver=4.2/rhel---8/4.2.0/x86_64/packages [16]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8841/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package [17]: https://github.com/containers/skopeo/ [18]: http://pkgs.devel.redhat.com/cgit/rpms/skopeo/ [19]: https://access.redhat.com/downloads/content/290/ver=4.1/rhel---8/4.1.34/x86_64/packages [20]: https://access.redhat.com/downloads/content/rhel---8/x86_64/8384/containers-common/0.1.32-5.git1715c90.el8/x86_64/fd431d51/package [21]: https://errata.devel.redhat.com/package/show/skopeo [22]: https://errata.devel.redhat.com/errata/content/46255 [23]: https://brewweb.engineering.redhat.com/brew/rpminfo?rpmID=7604818 [24]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7604818&filename=/etc/containers/storage.conf [25]: https://brewweb.engineering.redhat.com/brew/search [26]: https://brewweb.engineering.redhat.com/brew/search?match=glob&type=package&terms=skopeo [27]: https://brewweb.engineering.redhat.com/brew/packageinfo?packageID=58395 [28]: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=971200 [29]: https://brewweb.engineering.redhat.com/brew/rpminfo?rpmID=7349205 [30]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=6958325&filename=/etc/containers/storage.conf [31]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7334504&filename=/etc/containers/storage.conf [32]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7349205&filename=/etc/containers/storage.conf [33]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7550403&filename=/etc/containers/storage.conf [34]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7727297&filename=/etc/containers/storage.conf [35]: https://brewweb.engineering.redhat.com/brew/fileinfo?rpmID=7656074&filename=/etc/containers/storage.conf
Dropping this here for lack of a better place: A much bigger path we could take would be to have the MCO build a derived OSTree commit from the rendered MachineConfig (like as a build process) and serve that to other nodes in the cluster. This means we're not going into each node and changing config files (or things like the But going this path takes us entirely away from supporting traditional RHEL - so we'd also need to do #1592 |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lifecycle frozen |
Also of note, rpm-ostree is getting closer to stabilizing our "apply-live" code, xref https://lists.fedoraproject.org/archives/list/[email protected]/thread/MQWBKRFCYH2GB3CW5CG722RGQAEPHHAN/ - once that happens we can also support e.g. live-applying a subset of the changes to |
In recent discussion it was realized that a slightly hacky but totally viable way to do this would be for the MCO to take the Ignition config content under We've thought about this some in the context of non-RPM content; some discussion related to that in coreos/rpm-ostree#2326 |
One thing the MCO could do today is: Before initiating any node level changes to The effect of this is that we snapshot the current A downside of this approach is that by snapshotting the current |
Right. Another example is SRIOV operator that makes direct changes to node as they perform some per node config changes. |
Closing this as a dup of #3137 which we'll hopefully do with layering. |
With ostree, each deployment (bootable target) has its own copy of
/etc
.Today, the MCD writes into the current
/etc
. This has weird side effects; for example, it means that we may be affecting running static pods for kubelet. We may rewrite the pull secret.It also means we don't have rollback at the ostree level.
It'd be fairly easy to change the MCD to (on RHCOS) create a new deployment for pure config changes, and write the new changes to
/etc
there - leaving the booted system untouched.This would mean that config changes would be fully transactional and offline, the same way OS updates are.
The text was updated successfully, but these errors were encountered: