Skip to content

Commit

Permalink
Pin and pre-load images
Browse files Browse the repository at this point in the history
This patch adds an enhancement that describes a mechanism to pin and
pre-load container images.

Related: https://issues.redhat.com/browse/RFE-4482
Related: https://issues.redhat.com/browse/OTA-1001
Related: https://issues.redhat.com/browse/OTA-997
Related: openshift/machine-config-operator#3839
Related: openshift#1432

Signed-off-by: Juan Hernandez <[email protected]>
  • Loading branch information
jhernand committed Sep 21, 2023
1 parent b8555c4 commit 8a07077
Showing 1 changed file with 232 additions and 0 deletions.
232 changes: 232 additions & 0 deletions enhancements/machine-config/pin-and-pre-load-images.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
---
title: pin-and-pre-load-images
authors:
- "@jhernand"
reviewers:
- "@avishayt"
- "@danielerez"
- "@mrunalp"
- "@nmagnezi"
- "@oourfali"
approvers:
- "@sdodson"
- "@zaneb"
- "@LalatenduMohanty"
api-approvers:
- "@sdodson"
- "@zaneb"
- "@deads2k"
- "@JoelSpeed"
creation-date: 2023-09-21
last-updated: 2023-09-21
tracking-link:
- https://issues.redhat.com/browse/RFE-4482
see-also:
- https://github.com/openshift/enhancements/pull/1432
- https://github.com/openshift/machine-config-operator/pull/3839
replaces: []
superseded-by: []
---

# Pin and pre-load images

## Summary

Provide an mechanism to pin and pre-load container images.

## Motivation

Slow and/or unreliable connections to the image registry servers interfere with
operations that require pulling images. For example, an upgrade may require
pulling more than one hundred images. Failures to pull those images cause
retries that interfere with the upgrade process and may eventually make it
fail. One way to improve that is to pull the images in advance, before they are
actually needed, and ensure that they aren't removed.

### User Stories

#### Pre-load and pin upgrade images

As the administrator of a cluster that has a low bandwidth and/or unreliable
connection to an image registry server I want to pin and pre-load all the
images required for the upgrade in advance, so that when I decide to actually
perform the upgrade there will be no need to contact that slow and/or
unreliable registry server and the upgrade will successfully complete in a
predictable time.

#### Pre-load and pin application images

As the administrator of a cluster that has a low bandwidth and/or unreliable
connection to an image registry server I want to pin and pre-load the images
required by my application in advance, so that when I decide to actually deploy
it there will be no need to contact that slow and/or unreliable registry server
and my application will successfully deploy in a predictable time.

### Goals

Provide a mechanism that cluster administrators can use to pin and pre-load
container images.

### Non-Goals

None.

## Proposal

### Workflow Description

1. The administrator of a cluster uses the `ContainerRuntimeConfig` object to
request that a set of container images are pinned and pre-loaded:

```yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: ...
spec:
containerRuntimeConfig:
pinnedImages:
- quay.io/openshift-release-dev/ocp-release@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
...
```

1. The machine config operators ensures that all the images are pinned and
pulled in all the nodes of the cluster.

### API Extensions

There are no new object kinds introduced by this enhancement, but new fields
will be added to existing `ContainerRuntimeConfig` objects.

The new fields for the `ContainerRuntimeConfig` object are defined in detail in
https://github.com/openshift/machine-config-operator/pull/3839.

### Implementation Details/Notes/Constraints

Starting with version 4.14 of OpenShift CRI-O will have the capability to pin
certain images (see [this](https://github.com/cri-o/cri-o/pull/6862) pull
request for details). That capability will be used to pin all the images
required for the upgrade, so that they aren't garbage collected by kubelet and
CRI-O.

The changes to pin the images will be done in a `/etc/crio/crio.conf.d/pin.conf`
file, something like this:

```toml
pinned_images=[
"quay.io/openshift-release-dev/ocp-release@sha256:...",
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...",
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...",
...
]
```

The images need to be pre-loaded and the CRI-O service needs to be reloaded
when this configuration changes. To support that a new field will be added to
the `ContainerRuntimeConfig` object:

```yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: ...
spec:
containerRuntimeConfig:
pinnedImages:
- quay.io/openshift-release-dev/ocp-release@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
...
```

When the new `pinnedImages` field is added or changed the machine config
operator will need to pull those images (with the equivalent of `crictl pull`),
create or update the corresponding `/etc/crio/crio.conf.d/pin.conf` file and ask
CRI-O reload its configuration (with the equivalent of `systemctl reload
crio.service`).

The machine config operator will then will use the gRPC API of CRI-O to run the
equivalent of `crictl pull` for each of the images. When that is completed the
machine config operator will update the new `status.pinnedImages` field of the
rendered machine config:

```yaml
status:
pinnedImages:
- quay.io/openshift-release-dev/ocp-release@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
...
```

### Risks and Mitigations

None.

### Drawbacks

This approach requires non trivial changes to the machine config operator.

## Design Details

### Open Questions

None.

### Test Plan

We add a CI test that verifies that images are correctly pinned and pre-loaded.

### Graduation Criteria

The feature will ideally be introduced as `Dev Preview` in OpenShift 4.X,
moved to `Tech Preview` in 4.X+1 and declared `GA` in 4.X+2.

#### Dev Preview -> Tech Preview

- Availability of the CI test.

- Obtain positive feedback from at least one customer.

#### Tech Preview -> GA

- User facing documentation created in
[https://github.com/openshift/openshift-docs](openshift-docs).

#### Removing a deprecated feature

Not applicable, no feature will be removed.

### Upgrade / Downgrade Strategy

Not applicable.

### Version Skew Strategy

Not applicable.

### Operational Aspects of API Extensions

Not applicable, there are no API extensions.

#### Failure Modes

#### Support Procedures

## Implementation History

There is an initial prototype exploring some of the implementation details
described here in this [https://github.com/jhernand/upgrade-tool](repository).

## Alternatives

The alternative to this is to manually pull the images in all the nodes of the
cluster, manually create the `/etc/crio/crio.conf.d/pin.conf` file and manually
reload the CRI-O service.

## Infrastructure Needed

Infrastructure will be needed to run the CI test described in the test plan
above.

0 comments on commit 8a07077

Please sign in to comment.