-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade without registry enhancement #1432
Upgrade without registry enhancement #1432
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks good overall. Added some comments. |
3416e01
to
e34b7bb
Compare
@mrunalp Hi! can you please review? Who else should we tag? |
@sdodson ptal |
e34b7bb
to
38c3594
Compare
Thanks for your review @mrunalp. I tried to incorporate it, please take another look. |
38c3594
to
fa8ba50
Compare
1bee1b0
to
e159b94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks very good!
Some comments.
will be required to the CRI-O configuration during the upgrade. | ||
|
||
1. The cluster version operator needs to orchestrate the upgrade process. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to mention the need not to add/remove nodes while upgrading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the "CVO will need to orchestrate the upgrade" we say this:
To do so it will first disable auto-scaling to ensure that no new nodes
are added to the cluster while this process is progress
Is that enough?
@LalatenduMohanty We'll appreciate a review (feel free to pull others as well.... but not via always pull policy ;-) ). |
There is a cnv test e.g. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-azure-deploy-cnv/1699895359578312704 - it's just informing today, but at least provides the baseline capability for joint teams to monitor its success. |
does so in order refresh catalog images that are specified with a tag. But it | ||
also does it when it pulls catalog images that are specified with a digest. That | ||
should be changed to use the `IfNotPresent` pull policy for catalog images that | ||
are specified by digest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the catalogs to use PullIfNotPresent when the image spec is a SHA was something I had started working on a while ago here:
operator-framework/operator-lifecycle-manager#2659
I summarized where it got stuck here:
operator-framework/operator-lifecycle-manager#2659 (comment)
so this would need to be picked back up, but also i wouldn't necessarily consider the catalog pods to be part of upgrading a cluster, in the sense that they do not come from the payload, so it's not clear if they should be covered here?
is the goal to allow the cluster to run w/o a registry entirely, or to get its upgrade payload content w/o a registry? because those are two different things.
tldr: more discussion/thought needed around the expectations for catalog images.
cc @joelanford
dump. If there is not enough space it will be reported as an error conditions in | ||
the `ContainerRuntimeConfig` object. | ||
|
||
If there is enough space then the machine config operator will start a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enough disk space? enough memory/compute resources?
and what will be done if there is not enough space to run this embedded registry server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disk space. If there is not enough disk space the error will be reported in a condition and the upgrade aborted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I dont think we're running any registry here tbh. @jhernand, did I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run a temporary registry server to read from the bundle and copy the images to the container storage of the node. Note that this is only needed when using a bundle; it isn't needed when only pre-loading and pinning the images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disk space. If there is not enough disk space the error will be reported in a condition and the upgrade aborted.
can you make sure the EP states these things?
object and the upgrade process will be aborted. | ||
|
||
If there is enough space the `bundle-extractor` will extract the contents of | ||
the bundle to the `/var/lib/upgrade/4.13.7-x86_64` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if there is not enough space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error will reported in a condition and the upgrade will be aborted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where a multi-node cluster might end up with different results for different nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error will reported in a condition and the upgrade will be aborted.
please update the EP to state this
|
||
1. The upgrade infrastructure of the cluster ensures that all the images | ||
required for the upgrade are pinned and pre-loaded in all the nodes of the | ||
cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at what point do these images get unpinned? presumably we don't want to keep them around forever, so is there a plan to remove the N-1 images from the previous upgrade, or something along those lines? (or at least unpin them so GC can remove them as needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images will be pinned til the next upgrade. At that point they will be un-pinned, and the new ones will be pinned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the EP to state this?
presumably "pinned til the next upgrade" means "until the next upgrade is complete" (so we can be sure nothing is referencing the previous SHAs)
An external registry server is the currently supported solution for upgrades in | ||
disconnected environments. It doesn't require any change, but it is not | ||
feasible in most of the target environments due to the resource limitations | ||
described in the [motivation](#motivation) section of this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in particular I assume this is referring to this:
https://cloud.redhat.com/blog/introducing-mirror-registry-for-red-hat-openshift
you might reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am referring to that, added a link.
described in the [motivation](#motivation) section of this document. | ||
|
||
An internal registry server, running in the cluster itself, is a feasible | ||
alternative for clusters with multiple nodes. The registry server supported by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe. the orchestration of this as the registry moves between nodes during reboots, and how it pulls its own image, seem like questions that would need to be answered for a self-hosted registry solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a multi-node scenario that internal registry is a highly available service. That guarantees that it will be able to pull its own image from another node. We tested that with a regular Quay instance, and works correctly.
It was touched on in the comments, but... this EP appears to address the problem of how the cluster can get new payload images w/o access to a registry. But clusters don't exist to run payload images, so these clusters that are running w/o access to a registry..... where are they getting workload image content from? Are they running OLM operators? Where are they mirroring the olm content to and pulling it from? And we still have some intent to move existing core payload content to be external/OLM provided (not to mention layered products and existing OLM operators that clusters tend to run). So to what extent do we really expect a customer to run clusters with literally no registry available? Or is there some benefit in having the registry but not having to mirror the payload into that registry? I think maybe this EP should be split into two focuses:
|
More detail makes sense to me. Currently the enhancement has a non-goal entry saying:
I think that there should be some discussion in this enhancement of the RFE's:
Because that is the use-case that distinguishes images into "needs special handling so the cluster can restore the in-cluster registry" and "everything else can be served from the in-cluster registry" buckets. User workload would be in the latter bucket. OCP web-console would be in the latter bucket. The only images needed to launch the local in-cluster registry after a cold reboot of the whole cluster are the registry image, its operator image, etcd/Kube scheduler-stuff? I dunno. But very core things. You don't even need a cluster-version operator image for that restore. Maybe if the image registry goes the library-go static-manifest route you don't even need the etcd/Kube stuff and the only image you need is the registry image? |
This patch adds an enhancement that describes an automated mechanism to perform cluster upgrades without requiring an image registry server. Related: https://issues.redhat.com/browse/RFE-4482 Related: https://issues.redhat.com/browse/OTA-1001 Related: https://issues.redhat.com/browse/OTA-997 Signed-off-by: Juan Hernandez <[email protected]>
fb447d7
to
58ef5d9
Compare
@jhernand: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This enhancement stems from the desire to upgrade clusters installed using the appliance. In that context images for the release, operators and customer workloads are available in an additional partition and served using a temporary registry server that runs as a systemd service and reads from that partition. Access to any image that isn't in that partition requires access to another registry server, and it isn't the intent of this enhancement to solve that problem.
If there is a registry server that is reliable an fast enough then none of this enhancement is needed. We are trying to address the situation where deploying that registry isn't feasible.
Actually we already tried to separate those two intents. See the graduation criteria section, where we explain that we would like to implement the first intent first, and then the second. The first is as you describe: pre-load and pin images so that there is no need to contact a registry server during the upgrade. The second is almost as you describe, only that we are limiting ourselves to upgrades. Having both in this document seems to be a source of confusion. I will modify this enhancement so that it only covers the first part: pre-loading and pin. I will create another one for the second intent. |
In addition, for the cases were the cluster is completely disconnected or it | ||
isn't possible to use a registry server: | ||
|
||
1. An engineer uses the `oc adm upgrade create bundle ...` tool described in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which team will own oc adm upgrade create bundle
? The installer team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the same team that owns CVO, as it is an specific for upgrades.
collaborating with customer technicians. | ||
|
||
Eventually, the cluster will need to be upgraded, and then the technicians will | ||
need tools that make the process as simple as possible, ideally requiring no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the engineer be a support engineer? Or one of the developer engineers who will built the tool? Will the openshift documentation be the main source of knowledge or rather a detailed knowledge specific document living in e.g. the installer/oc repo? What resources/logs/etc. will be requested from a customer to properly debug if anything goes wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to re-read this as I've gotten lost between the earliest bits which clearly exclude a registry and later bits which seem to imply an image registry. I feel like we're relying on too many couplings. I think we can break things down into more distinct concerns, something like the following.
- Bundle building, how do we turn a release image, metadata, and any supplemental images into a digested or signed archive
- Bundle loading, how do we validate and extract that archive into container images and cluster resources (ie: configmaps w/ release payload signature, list of pinned images)
- Image distribution, how do we got from container images in one directory on one host to all hosts
- Image pinning, how do we ensure that those images are protected against garbage collection and unpin once no longer necessary
- Cluster Updates
I'd very much like for the last step to largely remain unaltered. Perhaps no more than loading a validating webhook which says that when this scheme is in place the clusterversion is not updated unless that image is distributed and pinned on all hosts.
Provide an automated mechanism to upgrade a cluster without requiring an image | ||
registry server, and without requiring OpenShift knowledge for the technicians | ||
performing the upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide an automated mechanism to upgrade a cluster without requiring an image | |
registry server, and without requiring OpenShift knowledge for the technicians | |
performing the upgrades. | |
Provide an automated mechanism to upgrade a cluster without requiring an off cluster | |
image registry server, and without requiring OpenShift knowledge for the technicians | |
performing the upgrades. |
I don't want the first sentence to preclude an Image Registry on cluster being part of the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
site, in particular I can't bring up an image registry server, neither outside | ||
of the cluster nor inside. I want to plug the USB stick provided by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify why starting an image registry inside the cluster is unacceptable, if it's resource utilization has the proposed method been confirmed to be substantially better in that regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a registry server inside single-node clusters is unacceptable because it introduces a chicken-egg problem if the registry server is a workload managed by the cluster.
For multi-node clusters an image registry server supported by Red Hat inside the cluster is unacceptable (that means full Quay) because it requires a large amount of resources (specially storage, requires ODF) and additional subscriptions. We have confirmed that the proposed mechanism is better in these regards: it doesn't need additional disks, it doesn't require a full Quay instance and doesn't require storage or registry workloads running after the upgrade has completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quay supports a use case very similar to this: https://docs.openshift.com/container-platform/4.13/installing/disconnected_install/installing-mirroring-creating-registry.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that is only supported in an external machine, not inside the cluster. In the context that we are trying to cover we can't ask the customer to bring up an additional machine for that mirror registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the cluster is HA and has protections against workloads becoming non HA. Would local storage operator with the registry configured to use file based storage not work? Even ephemeral storage seems like it could be made to work with an init container that mirrored content from its peer. The image to run the registry itself would need to be pinned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local storage operator requires an additional disk, and we can't guarantee that will be available in this context. Single-node clusters in particular often have only one disk.
What you describe is very close to our bundle approach. We use ephemeral storage: a temporary directory created from the bundle contents. We mirror content from peers: copy the bundle from peers. The registry server code is kind of pinned: it is part of CVO (or MCO, that is an implementation detail). In addition it only consumes resources during the upgrade, after that it goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also concerned about the proposed approach relying on low-level tools, including having to build new ones to prepare special software bundles, instead of developing an approach using higher-level tools like a regular registry and simply mirroring the required images into it (even if the mirror source is a USB drive).
Did anyone experiment with using the internal registry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the cluster is HA...
The local cluster is not actually highly available, because we have to deliver RFE-4482's :
I have a multi-node cluster that could have a complete power outage, and I want it to recover smoothly from that kind of disruption, despite the lack of access to image registries, local or remote.
For example, powering on the cluster after reaching the user-site. I still think we want to call that out in the use-case section of this enhancement, because it's the only reason I see that makes the in-cluster registry an insufficient solution.
It is not a goal to not require a registry server for other operations. For | ||
example, installing a new workload will still require a registry server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a non goal when it's also a goal to include user workload images in the bundle? What purpose does the workload image in the bundle serve if new workloads can be loaded from a remote registry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a registry server to deploy new workloads to the cluster is a choice of the customer. We do not want to interfere with that. What we want is to give customer the option of upgrading the existing workloads (that they probably installed with the appliance, so without an off cluster registry server) without a registry server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this should not preclude use of an external registry for new workload, but it seems very arbitrary to say that you can upgrade existing workloads via this method but not install new workloads via this method. How is that different?
Maybe the gap in understanding here is my understanding of scope of the appliance, I'll go look for that but if you have a reference that'd be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two separate mechanisms proposed here:
- The capability to pin and pre-load images.
- The upgrade bundle.
The first one, pin and pre-load images, can be universally useful and supportable, not just for upgrades but also for new workloads. But the second one is very specific for upgrades, I don't see how can we make it useful and supportable for installing new workloads.
@danielerez @nmagnezi can you point @sdodson to the appliance documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's the appliance user-guide: https://github.com/openshift/appliance/blob/master/docs/user-guide.md
|
||
### User Stories | ||
|
||
#### Prepare an upgrade bundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CVO only requires that the signature for the payload be loaded, which presumably this tool would do. The CVO could check the current and proposed release payload to ensure that they're connected in addition to validating those payloads and the metadata within them are authentic. This does not currently happen, as long as the payload signature is validated the CVO will update to any digest based payload.
As an engineer working in a partner's factory I want to be able to include in | ||
the upgrade bundle custom images for workloads specific to the customer. | ||
|
||
#### Explicitly allow vetted upgrade bundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's got to be either a digest supplied via pre-established trust like the cluster admin setting the expected digest prior to accepting the contents of the bundle or a signing mechanism with pre-trusted key. Either way it needs to apply to the entire bundle including the metadata that references any image loaded. Thta metadata needs to include release payload signatures as well.
does so in order refresh catalog images that are specified with a tag. But it | ||
also does it when it pulls catalog images that are specified with a digest. That | ||
should be changed to use the `IfNotPresent` pull policy for catalog images that | ||
are specified by digest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding running a cluster versus just upgrading, if a drive fails what are the expectations upon replacement? Are these baremetal platform clusters where the host netboots and reinstalls or would physical media with RHCOS and the bundle be provided? Is it a challenge for the technician to know which bundle to provide if so? Maybe the drive is preloaded prior to delivery?
not be able to start after the upgrade. To mitagate that risk when a bundle is | ||
used the upgrade mechanism will check if there are pods using the `Always` pull | ||
policy, and will emit a warning if there are any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's opaque? A customer's binary operator creates the pod specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the potential problem will go undetected: it isn't 100% exact, but I think it helps anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we would want the warning (or even an error) when building the bundle, rather than when it is installed.
Once the image is pinned and pre-loaded the cluster version operator will | ||
inspect it and find out the references to the payload images. It will then also | ||
pin and pre-load those images, updating the `ContainerRuntimeConfig` object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the hand off to CVO here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this inspection of the release image is something that belongs into CVO more than MCO. I assume CVO already does something similar today to verify signatures, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the CVO should just act on the image not care about how the image got there. Certainly if the CVO is explicitly pulling the image we should fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CVO does not pull anything, because we know that image access is fiddly. We get access to incoming release-image manifests by launching a version-*
pod with a shared volume, so CRI-O can go fetch the image for us, the version-*
containers copy the manifests into the shared volume, and the outgoing CVO grabs the manifests from the shared volume to start rolling out the update.
It is desirable to have another test that scans the OCP components looking for | ||
use of the `Always` pull policy. This should probably run for each pull request | ||
of each OCP component, and prevent merging if it detects that the offending pull | ||
policy is used. We should consider adding admission in CI for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the admission webhook suggested earlier, if that were deployed during an upgrade test and prior to OLM operator tests, would that catch all uses?
|
||
- The cluster is not connected to the Internet, or the bandwidth is very limited. | ||
- It isn't possible to bring up additional machines, even temporarily. | ||
- The resources of the cluster are limited, in particular in terms of CPU and memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the registry need to be running all the time if it's only needed during the upgrade?
site, in particular I can't bring up an image registry server, neither outside | ||
of the cluster nor inside. I want to plug the USB stick provided by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also concerned about the proposed approach relying on low-level tools, including having to build new ones to prepare special software bundles, instead of developing an approach using higher-level tools like a regular registry and simply mirroring the required images into it (even if the mirror source is a USB drive).
Did anyone experiment with using the internal registry?
In addition, for the cases were the cluster is completely disconnected or it | ||
isn't possible to use a registry server: | ||
|
||
1. An engineer uses the `oc adm upgrade create bundle ...` tool described in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make sure all of these details are written down in the alternatives section, please?
not be able to start after the upgrade. To mitagate that risk when a bundle is | ||
used the upgrade mechanism will check if there are pods using the `Always` pull | ||
policy, and will emit a warning if there are any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we would want the warning (or even an error) when building the bundle, rather than when it is installed.
a fallback alternative. These need to be changed so that they don't do it or so | ||
that they have a fallback mechanism when the registry server isn't available. | ||
|
||
For example, in OpenShift 4.1.13 the machine config operator runs the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 4.1.13
correct here? If so, is there a more recent version with this problem?
``` | ||
|
||
We explored several alternatives for the format of the bundle, and using a dump | ||
of a registry server to be the best one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having all of these alternatives inline disrupts the flow of trying to understand the actual proposal. Could you move them down to the "Alternatives" section?
|
||
The cluster version operator will be monitoring device events in all the nodes | ||
of the cluster in order to detect when such an USB stick (or any other kind of | ||
media containing an upgrade bundle) has been plugged. To control that a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph should say "The bundle-monitor will be monitoring device events..." instead of the CVO.
object and the upgrade process will be aborted. | ||
|
||
If there is enough space the `bundle-extractor` will extract the contents of | ||
the bundle to the `/var/lib/upgrade/4.13.7-x86_64` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where a multi-node cluster might end up with different results for different nodes.
... | ||
] | ||
} | ||
node1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to produce a lot of data in a large cluster, and a lot of potential collisions as nodes try to write their status to the same resource.
```bash | ||
$ oc adm upgrade create bundle \ | ||
--arch=x86_64 \ | ||
--version=4.13.7 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting this to a version and arch as input is going to make it challenging to create bundles with hot fixes. It might be better to use an explicit image reference here, just like with other admin commands that operate on images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be big change in MCO workflow. Don't see any timeframe mentioned in the EP, wondering what time-frame we are looking for these changes and who is going to implement and maintain them?
1. No OpenShift component should explicitly try to contact the registry server | ||
without a fallback alternative. | ||
|
||
1. The machine config operator needs to support image pinning, pre-loading and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we perform image pinning only for OCP specific images or it also includes customer workload images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. But in high-level, the right approach would be:
- Pin OCP images - for cases where the cluster is used to run customer workloads only
- Pin OCP images and partner images - for cases where partners have an appliance (basically OCP + more infra workloads they need to run) - in this case, you'll want to have both OCP and partner images pinned. While the customer workloads aren't.
An example, Dell ships an appliance that includes OCP + powerflex storage. You'd need both OCP images and powerflex related images pinned. This setup is used by customers to run their workloads. They can host these images in a registry (in cluster) and pull those from there. But the images required for OCP + Dell software + the in-cluster registry, should be pinned so that everything is running (for reboot, upgrade, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, thanks for the explanation.
- The cluster is not connected to the Internet, or the bandwidth is very limited. | ||
- It isn't possible to bring up additional machines, even temporarily. | ||
- The resources of the cluster are limited, in particular in terms of CPU and memory. | ||
- The technicians responsible for performing the upgrade have little or no OpenShift knowledge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These motivations does not seem right because without the registry customer can not run the workload. I think we want to separate the lifecycle events and infrastructure for OCP from the infrastructure required for workloads.
to be started manually later or it if should wait till the user explicitly sets | ||
it to `Immediate`. The default value for that will be `Immediate`. This is | ||
intended for situations where the user wishes to prepare everything for the | ||
upgrade in advance, but wants to perform the upgrade later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a reason around why the update schedule needs to be part of this enhancement. We are planning introduce too many changes and some of the changes directly not related to the goal. The update schedule is a generic feature which should be a separate enhancement.
install) as well as workload images and buffer. | ||
|
||
The changes to pin the images will be done in a | ||
`/etc/crio/crio.conf.d/pin-upgrade.conf` file, something like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that could actually simplify things here is to make this in /run/crio/crio.conf
- i.e. it's an explicitly transient state before the next boot - and the config automatically goes away after the next boot (when we've upgraded).
I think we should flesh out some more "transient MachineConfig" style flows that write to /run
, don't reboot, and don't have config drift monitoring etc.
Well...this would have a larger blast radius, but I think a tension we're seeing here is we don't have a good place to do stuff like this because IMO it fits squarely in between the CVO and MCO. But...we could create a new component (github.com/openshift-upgrade-orchestrator) or something that would sit below the CVO but own things like rolling out a temporary registry if applicable, etc. (Or, failing that, it's literally a new controller in github.com/openshift/machine-config/operator/pkg/controller/upgrade-infra-orchestrator or something that could have a distinct OWNERS perhaps...) |
This patch adds a new `pinnedImages` field to the `ContainerRuntime` type. Related: openshift/enhancements#1432 Related: https://issues.redhat.com/browse/RFE-4482 Signed-off-by: Juan Hernandez <[email protected]>
This patch adds a new `pinnedImages` field to the `ContainerRuntime` type. Related: openshift/enhancements#1432 Related: https://issues.redhat.com/browse/RFE-4482 Signed-off-by: Juan Hernandez <[email protected]>
This patch adds an enhancement that describes a mechanism to pin and pre-load container images. Related: https://issues.redhat.com/browse/RFE-4482 Related: https://issues.redhat.com/browse/OTA-1001 Related: https://issues.redhat.com/browse/OTA-997 Related: openshift/machine-config-operator#3839 Related: openshift#1432 Signed-off-by: Juan Hernandez <[email protected]>
Dear reviewers, as agreed last week in the architecture meeting I am splitting this enhancement into three smaller enhancements:
The two first ones are ready for you to take a look: The third one is less important and I will create it later. |
This patch adds a new `PinnedImageSet` object that describes how to pre-load and pin container images. Related: openshift/enhancements#1432 Related: https://issues.redhat.com/browse/RFE-4482 Signed-off-by: Juan Hernandez <[email protected]>
This patch adds a new `PinnedImageSet` object that describes how to pre-load and pin container images. Related: openshift/enhancements#1432 Related: https://issues.redhat.com/browse/RFE-4482 Signed-off-by: Juan Hernandez <[email protected]>
OTA-1001: Upgrade without registry enhancement
This patch adds an enhancement that describes an automated mechanism to perform cluster upgrades without requiring an image registry server.
Related: https://issues.redhat.com/browse/RFE-4482
Related: https://issues.redhat.com/browse/OTA-1001
Related: https://issues.redhat.com/browse/OTA-997
Related: openshift/machine-config-operator#3839
Related: openshift/api#1548