E2E CI Test for Operator Bundle #28

adambkaplan · 2021-11-02T14:48:14Z

Changes

This change augments the e2e test suite to simulate the operator's deployment using OLM.
The setup consists of the following components:

A docker/distribution container registry, running in docker outside of any Kubernetes cluster.
A KinD cluster that is configured to resolve image refs which use "localhost" as the image registry domain.
Installing OLM on the KinD cluster.

Once set up, the operator and its associated OLM bundle are built and pushed to the local container registry.
Next, an OLM catalog is built based on the catalog published in operatorhub.io.
The catalog is what allows OLM to find the Tekton operator that Shipwright depends on, and is likewise pushed to the local container registry.

Building the operator, bundle, and catalog with a fully on-cluster registry is problematic for several reasons:

Not all tools can push to the on-cluster registry in this fashion
Mainfests need to be rewritten to reference the on-cluster DNS name for the registry
The catalog source needs to be pullable within the cluster.

The test runs as follows:

Create a namespace to run the operator under test
Create a CatalogSource using the catalog containing the operator under test.
Create an OperatorGroup which allows AllNamespace operators to be installed in the given namespace.
Create a Subscription to install the Shipwright operator and its associated Tekton operator.
Verify that the shipwright operator deploys successfully.

Submitter Checklist

Includes tests if functionality changed/was added
Includes docs if changes are user-facing
Set a kind label on this PR
Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

Add CI and developer documentation on how to deploy the operator using OLM.

adambkaplan · 2021-11-08T15:45:54Z

/hold

Should merge after #30

gabemontero · 2021-11-08T17:54:20Z

how hard would it be to enable either PSP or the PSP follow on to turn on enforcement in our kind cluster to that running a non-root is proven out?

If it is too much to take on with this PR, can you open an issue to track doing that longer term?

adambkaplan · 2021-11-12T15:21:54Z

how hard would it be to enable either PSP or the PSP follow on to turn on enforcement in our kind cluster to that running a non-root is proven out?

@gabemontero are you referring to PodSecurity Policy (deprecated) or the new PodSecurity admission plugin (which has enforcement mechanisms for the new Pod Security Standards). For the latter, I would prefer we add it in a follow-up PR, as the plugin is only available in k8s 1.22.

gabemontero · 2021-11-12T15:42:28Z

how hard would it be to enable either PSP or the PSP follow on to turn on enforcement in our kind cluster to that running a non-root is proven out?

@gabemontero are you referring to PodSecurity Policy (deprecated) or the new PodSecurity admission plugin (which has enforcement mechanisms for the new Pod Security Standards). For the latter, I would prefer we add it in a follow-up PR, as the plugin is only available in k8s 1.22.

Referring to both. You could either

do deprecated PSP now, then switch to the plugin when we go to 1.22
do not do PSP now (but we have not upstream validation in the meantime) and wait until 1.22 to do the plugin

If you do 1) now, great. If you want to open a tracking item to do 1) at some point in the future, outside of this PR, ok, or If you open a tracking item to do 2), ok

As long as one of those 3 possible actions is taken, I'm good

adambkaplan · 2021-11-12T18:52:03Z

I don't think we need PodSecurityPolicy to ensure that we don't run pods as root - securityContext.runAsNonRoot enforces that the containers don't run as root at the kubelet level. A previous iteration of the runAsNonRoot PR failed because the rbac proxy ran as root on the KinD cluster. That version didn't fail on OpenShift because it has an admission webhook that sets an arbitrarily high runAsUser value on most pods/containers.

I can file an issue to enable the PodSecurity plugin when we upgrade to 1.22, and furthermore enforce the restricted profile for the operator.

adambkaplan · 2021-11-12T18:55:32Z

Filed #33

adambkaplan · 2021-11-12T18:56:24Z

/hold cancel

#30 merged.

adambkaplan · 2021-11-17T19:36:18Z

/approve

Self-approving

openshift-ci · 2021-11-17T19:36:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [adambkaplan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

otaviof

I think we should use the same tools across all projects in the Shipwright organization, as in the hack scripts we need for KinD, local container registry and such. Currently we are duplicating those scripts on each new interaction, with sightly differences on each place, making the future maintenance less predictable.

As well, I miss having the Makefile as the "entrypoint" for (almost) all automation that we need in a given project. The Makefile acts as "source of authority" for automation, keeps the environment variables under control, and serves a place to document and invoke our automation inventory. Likewise, we should try to have the same Makefile targets across all projects in the organization, as much as possible.

So, I think we should find means to avoid duplicating our CI scripts, and make sure the Makefile is the concise place to trigger automation in the project. That could be something this PR could tackle, or even a future improvement too.

.github/workflows/ci.yml

adambkaplan · 2021-11-30T20:06:16Z

I think we should use the same tools across all projects in the Shipwright organization, as in the hack scripts we need for KinD, local container registry and such. Currently we are duplicating those scripts on each new interaction, with sightly differences on each place, making the future maintenance less predictable.

I agree that this will become a challenge over time. It sounds like we are ready to build some test-infra tooling!

adambkaplan · 2021-11-30T20:10:27Z

Filed shipwright-io/community#44 to discuss how to approach common tooling.

otaviof

Some minor comments on scripting, but other than that it looks really good 👍🏼

otaviof · 2021-12-01T08:19:11Z

hack/run-operator-catalog.sh

+attempts=1
+while [[ ${attempts} -le 10 ]]; do
+    echo "Checking the status of the operator rollout - attempt ${attempts}"
+    if ${k8s} rollout status deployment "${namePrefix}operator" -n "${subNamespace}"; then


Here you can use the option --timeout and let Kubernetes client handle the retries for you. We do the same on the CLI, please consider.

With the timeout option in place, we may also simplify the script not having to handle the attempt for-loop.

I decided to use a single function to wait on pod status.

hack/run-operator-catalog.sh

adambkaplan · 2021-12-01T19:52:35Z

hack/run-operator-catalog.sh

+    # Pod may not exist, in which case wait 30 seconds and try again
+    ${KUBECTL_BIN} wait --for=condition=Ready pod -l "${label}" -n "${namespace}" --timeout "${timeout}" || \
+        sleep 30 && ${KUBECTL_BIN} wait --for=condition=Ready pod -l "${label}" -n "${namespace}" --timeout "${timeout}"


The 30 second gap before the retry is needed because OLM does a lot of work to deploy the operator, and no pods may exist when the initial wait call is made.

adambkaplan · 2021-12-03T17:07:34Z

bump @otaviof

otaviof · 2021-12-03T17:45:52Z

test/kind/verify-kind.sh

+
+echo "# Using KinD context..."
+${KUBECTL_BIN} config use-context "kind-kind"
+cho "# KinD nodes:"


Have we lost the letter e on echo?

Yes - and not adding set -e here let it pass through 🤦

otaviof

Thanks, very good additions to this project!

/lgtm

This change augments the e2e test suite to simulate the operator's deployment using OLM. The setup consists of the following components: - A docker/distribution container registry, running in docker outside of any Kubernetes cluster. - A KinD cluster that is configured to resolve image refs which use "localhost" as the image registry domain. - Installing OLM on the KinD cluster. Once set up, the operator and its associated OLM bundle are built and pushed to the local container registry. Next, an OLM catalog is built based on the catalog published in operatorhub.io. The catalog is what allows OLM to find the Tekton operator that Shipwright depends on, and is likewise pushed to the local container registry. Building the operator, bundle, and catalog with a fully on-cluster registry is problematic for several reasons: - Not all tools can push to the on-cluster registry in this fashion - Mainfests need to be rewritten to reference the on-cluster DNS name for the registry - The catalog source needs to be pullable within the cluster. The test runs as follows: - Create a namespace to run the operator under test - Create a CatalogSource using the catalog containing the operator under test. - Create an OperatorGroup which allows AllNamespace operators to be installed in the given namespace. - Create a Subscription to install the Shipwright operator and its associated Tekton operator. - Verify that the shipwright operator deploys successfully. Contributor documentation has also been updated so that developers can run this process using make commands on their Kubernetes cluster of choice. See also: - https://kind.sigs.k8s.io/docs/user/local-registry/ - https://olm.operatorframework.io/docs/tasks/creating-a-catalog/ - https://olm.operatorframework.io/docs/tasks/make-catalog-available-on-cluster/ - https://olm.operatorframework.io/docs/tasks/install-operator-with-olm/ - https://olm.operatorframework.io/docs/advanced-tasks/operator-scoping-with-operatorgroups/

Use the `operatorhub/catalog_sa` image as the base for the catalog index. The default operatorhub catalog appears to have a root-owned file that causes `opm index add` to fail. See operator-framework/operator-registry#870

o.MatchError will panic if a k8s NotFound error is returned. This is fixed by checking that a NotFound error is raised separate from a gomega match.

openshift-ci · 2021-12-03T19:30:58Z

New changes are detected. LGTM label has been removed.

adambkaplan · 2021-12-06T15:26:35Z

Re-tagged @otaviof 's lgtm (I needed to push a minor fix that caused tests to break).

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 2, 2021

openshift-ci bot requested review from mattcui and zhangtbj November 2, 2021 14:48

adambkaplan force-pushed the bundle-build-ci branch 5 times, most recently from ef0d8a4 to 8caa8ca Compare November 5, 2021 21:05

adambkaplan changed the title ~~WIP - Verify operator and bundle work in CI testing~~ E2E CI Test for Operator Bundle Nov 8, 2021

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 8, 2021

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2021

adambkaplan force-pushed the bundle-build-ci branch from 8caa8ca to af2594e Compare November 8, 2021 16:42

gabemontero mentioned this pull request Nov 8, 2021

Run operator as non-root #30

Merged

4 tasks

adambkaplan mentioned this pull request Nov 12, 2021

Enable PodSecurity Plugin for KinD Tests #33

Closed

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2021

otaviof reviewed Nov 25, 2021

View reviewed changes

.github/workflows/ci.yml Show resolved Hide resolved

adambkaplan force-pushed the bundle-build-ci branch from af2594e to 0285982 Compare November 30, 2021 20:04

adambkaplan mentioned this pull request Nov 30, 2021

Discussion: Consolidate CI tooling shipwright-io/community#44

Open

otaviof reviewed Dec 1, 2021

View reviewed changes

adambkaplan force-pushed the bundle-build-ci branch from fef698a to 8dd6d1b Compare December 1, 2021 19:47

adambkaplan commented Dec 1, 2021

View reviewed changes

adambkaplan force-pushed the bundle-build-ci branch from 8dd6d1b to fdcbe7d Compare December 2, 2021 16:25

otaviof reviewed Dec 3, 2021

View reviewed changes

adambkaplan force-pushed the bundle-build-ci branch from fdcbe7d to 4050bbc Compare December 3, 2021 18:38

otaviof approved these changes Dec 3, 2021

View reviewed changes

openshift-ci bot assigned otaviof Dec 3, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2021

adambkaplan added 3 commits December 3, 2021 14:30

Work around operator-registry copy issue

03f3e48

Use the `operatorhub/catalog_sa` image as the base for the catalog index. The default operatorhub catalog appears to have a root-owned file that causes `opm index add` to fail. See operator-framework/operator-registry#870

Fix panic in unit test

dd409c0

o.MatchError will panic if a k8s NotFound error is returned. This is fixed by checking that a NotFound error is raised separate from a gomega match.

adambkaplan force-pushed the bundle-build-ci branch from 4050bbc to dd409c0 Compare December 3, 2021 19:30

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2021

adambkaplan added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2021

openshift-merge-robot merged commit ea9571c into shipwright-io:main Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E CI Test for Operator Bundle #28

E2E CI Test for Operator Bundle #28

adambkaplan commented Nov 2, 2021 •

edited

Loading

adambkaplan commented Nov 8, 2021

gabemontero commented Nov 8, 2021

adambkaplan commented Nov 12, 2021

gabemontero commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 17, 2021

openshift-ci bot commented Nov 17, 2021

otaviof left a comment

adambkaplan commented Nov 30, 2021

adambkaplan commented Nov 30, 2021

otaviof left a comment

otaviof Dec 1, 2021

adambkaplan Dec 1, 2021

adambkaplan Dec 1, 2021

adambkaplan commented Dec 3, 2021

otaviof Dec 3, 2021

adambkaplan Dec 3, 2021

otaviof left a comment

openshift-ci bot commented Dec 3, 2021

adambkaplan commented Dec 6, 2021

E2E CI Test for Operator Bundle #28

E2E CI Test for Operator Bundle #28

Conversation

adambkaplan commented Nov 2, 2021 • edited Loading

Changes

Submitter Checklist

Release Notes

adambkaplan commented Nov 8, 2021

gabemontero commented Nov 8, 2021

adambkaplan commented Nov 12, 2021

gabemontero commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 12, 2021

adambkaplan commented Nov 17, 2021

openshift-ci bot commented Nov 17, 2021

otaviof left a comment

Choose a reason for hiding this comment

adambkaplan commented Nov 30, 2021

adambkaplan commented Nov 30, 2021

otaviof left a comment

Choose a reason for hiding this comment

otaviof Dec 1, 2021

Choose a reason for hiding this comment

adambkaplan Dec 1, 2021

Choose a reason for hiding this comment

adambkaplan Dec 1, 2021

Choose a reason for hiding this comment

adambkaplan commented Dec 3, 2021

otaviof Dec 3, 2021

Choose a reason for hiding this comment

adambkaplan Dec 3, 2021

Choose a reason for hiding this comment

otaviof left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 3, 2021

adambkaplan commented Dec 6, 2021

adambkaplan commented Nov 2, 2021 •

edited

Loading