Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to cert-manager 1.4 #57

Closed
3 of 4 tasks
maelvls opened this issue Jun 18, 2021 · 8 comments · Fixed by #58
Closed
3 of 4 tasks

Update to cert-manager 1.4 #57

maelvls opened this issue Jun 18, 2021 · 8 comments · Fixed by #58

Comments

@maelvls
Copy link
Member

maelvls commented Jun 18, 2021

Still to be done as of 6 July 2021:

  • Deprecate 1.1 and 1.3 in the Marketplace admin UI.
  • Have a review on Add a deprecation warning for 1.1 and 1.3 #60
  • Have a review on Bump to cert-manager 1.4 #58
  • Re-submit again and again until the review passes
    • Attempt 1 (20 June 2021)
    • Refusal 1: I submitted 1.3 as the "default" version instead of 1.4 (my fault)
    • Attempt 2 (27 June 2021)
    • Refusal 2: issue the transition from GoogleCASIssuer v1alpha1 -> v1beta1
    • Attempt 3 (29 June 2021)
    • Refusal 3 (29 June 2021): the testrunner fails with no clear indication of what is failing
    • Message from James Westby about our struggles with the testrunner (29 June 2021)
    • Google Engineer team investigating a bug with the backend (6 July 2021)
    • Refusal 4: (7 July 2021) the info field still present
    • Attempt 5 (8 July 2021), image not changed.
    • Refusal 5 (13 July 2021)

cert-manager v1.4.0 was release on 15 July 2021 and we want to update the jetstack-secure-for-cert-manager app on the Google Cloud Marketplace to be updated within a few days of each release of cert-manager.

Using the Cutting a new release instructions, we shall update the Google Cloud Marketplace app from 1.3.1 to 1.4.0.

⚠️ New Role have to be added to schema.yaml. To see what needs to be added to schema.yaml:

# From the cert-manager repo
git diff origin/release-1.3..origin/release-1.4 deploy/charts/cert-manager

Role to be added to both the cainjector and controller service accounts:

  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    resourceNames: ["cert-manager-cainjector-leader-election", "cert-manager-cainjector-leader-election-core"]
    verbs: ["get", "update", "patch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["create"]

ClusterRole needed:

rules:
  - apiGroups: ["certificates.k8s.io"]
    resources: ["certificatesigningrequests"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["certificates.k8s.io"]
    resources: ["certificatesigningrequests/status"]
    verbs: ["update"]
  - apiGroups: ["certificates.k8s.io"]
    resources: ["signers"]
    resourceNames: ["issuers.cert-manager.io/*", "clusterissuers.cert-manager.io/*"]
    verbs: ["sign"]
  - apiGroups: ["authorization.k8s.io"]
    resources: ["subjectaccessreviews"]
    verbs: ["create"]

Estimation: 1 hour

@maelvls
Copy link
Member Author

maelvls commented Jun 21, 2021

Note: I opened GoogleCloudPlatform/marketplace-k8s-app-tools#564 to raise the issue of not being able to create a Role that targets the kube-system namespace.

@maelvls
Copy link
Member Author

maelvls commented Jun 21, 2021

I submitted 1.4.0-gcm.0 for review, it should be published by tomorrow.

@maelvls
Copy link
Member Author

maelvls commented Jun 24, 2021

The issues I encountered:

  1. I did not pay attention to the updates made to google-cas-issuer, although the change log is very clear. Notably, I failed properly updating from v1alpha1 to v1beta1.

  2. I struggled a lot with the now required leases resource, and I ended up using a ClusterRole with resourceNames instead of a Role, and opened an issue on mpdev: RBAC Role rules for namespaces outside of the app itself GoogleCloudPlatform/marketplace-k8s-app-tools#564.

  3. Like usual, the thing that made me waste the most time was the fact that mpdev only shows status codes, not stdout nor stderr:

    >>> Running /smoke-test.yaml
     >   0: kubectl smoke test
     PASSED
     >   1: Create test issuer and self signed cert
     PASSED
     >   2: Try to get new cert
     PASSED
     >   3: Try to get cert secret
     PASSED
     >   4: Delete issuer and self signed cert
     PASSED
     >   5: Create a GoogleCASIssuer and a certificate
     FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
     >   6: Delete google CAS issuer and certificate
     FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
     >> Summary: 2 FAILED, 5 PASSED

    No way to know what went wrong. It really feels like "unfinished" tooling 😥
    I also opened an issue about that: The test runner "bashTest" does not show stderr and stdout on failure GoogleCloudPlatform/marketplace-k8s-app-tools#565.

  4. Finally, the upgrade that Google did from v1beta1 to v1 of the CRD in app-crd.yaml broke the application in 1.1 and 1.3 (1.1 and 1.3 are broken, deprecate them #59). More specifically, we had:

    apiVersion: app.k8s.io/v1beta1
    kind: Application
    spec:
      descriptor:
        ...
        info: []

    It should have been:

    apiVersion: app.k8s.io/v1beta1
    kind: Application
    spec:
      descriptor:
        ...
      info: []

    It seems like when Google upgraded the Application CRD from v1beta1 to v1 (this in the version of the CRD object, not the version of the Application itself). After this change, the above Application manifest could not be applied anymore. The error looked like this:

      error: error validating "/data/resources.yaml": error validating data:
      ValidationError(Application.spec.descriptor): unknown field "info" in
      io.k8s.app.v1beta1.Application.spec.descriptor; if you choose to ignore
      these errors, turn validation off with --validate=false

    My guess is that before this change, the faulty "info" field was not being validated, and the new v1 CRD version started validating it. I raised this pain point on their issue tracker: The update from v1beta1 to v1 broke our old deployer images GoogleCloudPlatform/marketplace-k8s-app-tools#566

@maelvls
Copy link
Member Author

maelvls commented Jun 29, 2021

Update 29 June: (internal email)

The API version issue was resolved and noticed that the tester pod is failing at our verification service with the following error in the logs:

I0625 18:03:30.965105       1 main.go:86] >>> Running /smoke-test.yaml
I0625 18:03:30.966237       1 main.go:136]  >   0: kubectl smoke test
I0625 18:03:31.145790       1 main.go:141]  PASSED
I0625 18:03:31.145824       1 main.go:136]  >   1: Create test issuer and self signed cert
I0625 18:03:32.482440       1 main.go:141]  PASSED
I0625 18:03:32.482507       1 main.go:136]  >   2: Try to get new cert
E0625 18:03:32.884330       1 main.go:143]  FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
I0625 18:03:32.884363       1 main.go:136]  >   3: Try to get cert secret
E0625 18:03:33.130651       1 main.go:143]  FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
I0625 18:03:33.130706       1 main.go:136]  >   4: Delete issuer and self signed cert
E0625 18:03:33.648541       1 main.go:143]  FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
I0625 18:03:33.648579       1 main.go:136]  >   5: Create a GoogleCASIssuer and a certificate
E0625 18:03:34.999642       1 main.go:143]  FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
I0625 18:03:34.999676       1 main.go:136]  >   6: Delete google CAS issuer and certificate
E0625 18:03:36.026567       1 main.go:143]  FAILED: Bash test failed > Unexpected exit status code > Should have equaled 0, but was 1
E0625 18:03:36.026692       1 main.go:119]  >> Summary: 5 FAILED, 2 PASSED
I0625 18:03:36.026732       1 main.go:123]  >   0: kubectl smoke test: PASSED
I0625 18:03:36.026778       1 main.go:123]  >   1: Create test issuer and self signed cert: PASSED
E0625 18:03:36.026795       1 main.go:125]  >   2: Try to get new cert: FAILED
E0625 18:03:36.026802       1 main.go:125]  >   3: Try to get cert secret: FAILED
E0625 18:03:36.026807       1 main.go:125]  >   4: Delete issuer and self signed cert: FAILED
E0625 18:03:36.026812       1 main.go:125]  >   5: Create a GoogleCASIssuer and a certificate: FAILED
E0625 18:03:36.026818       1 main.go:125]  >   6: Delete google CAS issuer and certificate: FAILED
E0625 18:03:36.026824       1 main.go:95] >>> SUMMARY: 5 failed
ERROR SMOKE_TEST Tester 'Pod/smoke-test-pod' failed.

Can you make sure your application passes mpdev verify. Instructions: https://github.com/GoogleCloudPlatform/marketplace-k8s-app-tools/blob/ master/docs/mpdev-references.md#smoke-test-an-application>.

Please ensure that the tester pod completes with a zero exit status and resubmit the draft for a review. Let me know if you have any questions. > Thank you.

Regards,
Dinesh

Note that the above-mentioned test cases are defined in smoke-test.yaml.

@maelvls
Copy link
Member Author

maelvls commented Jul 6, 2021

Update 6 July: (internal email) our release of 1.4.0-gcm.0 is now waiting on Google. On 1 July 2021, Dinesh mentioned he is in contact with the engineering team.

Apologies for the delay here. I'm following up internally with Eng to see what's going wrong here -- I'll let you know once I get an answer. Thank you.

@maelvls maelvls pinned this issue Jul 13, 2021
@maelvls
Copy link
Member Author

maelvls commented Jul 13, 2021

Today (13 July), Dinesh reported that the tests are failing. Dinesh now gives us the sha256 of each failing image:

Apologies for the delay here. Your listing has 3 different versions on the Marketplace --the following two deployer images are failing due to the infofield, which is not present in the CRD:

  • gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:732f49aac58fa25f73a5dd3a7a422f5e0520802b372676d8605a67d3a383480e
  • gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:d5e11520513313f08da87a58d44469aa0a0c4799ee798e4418dda321195bfe22

And the latest deployer image (gcr.io/jetstack-public/jetstack-secure-for-cert-manager/deployer@sha256:4fb179cf2a784dddb48ea86cf9e437c921b790ae060f84e16be373cc3ef108e4) is failing with the following error message:

CustomResourceDefinition.apiextensions.k8s.io "googlecasissuers.cas-issuer.jetstack.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions

Please fix the above errors, validate the versions again using mpdev and resubmit the draft for approval. Thank you.

We now know that the testing infrastructure at Google is running mpdev verify sequentially on all existing versions (1.1, 1.3, 1.4). Previously, I thought the tests were only run for the latest version that we submitted.

I have now re-built and re-submitted and re-created a GitHub release draft for all three images with the info field fix:

But the fact that they run these three versions sequentially means that the v1alpha1 -> v1beta1 CRD of the Google CAS issuer breaks things as reported in the above error (see this email for more details).

I'm not sure how to go about that. I'll ask @jakexks now.

@maelvls
Copy link
Member Author

maelvls commented Jul 15, 2021

I just tried mpdev verify and found out that it only removes namespaced resources and leaves all the cluster-scoped resources behind (as per set_ownership.py). It seems to be due to the fact that ownerReferences can only be used with namespaced resources, not cluster-wide resources.

I still have no idea how to go around this issue 😞

@maelvls
Copy link
Member Author

maelvls commented Jul 17, 2021

1.1, 1.3 and 1.4 were accepted last night!!

@maelvls maelvls closed this as completed Jul 17, 2021
@wallrj wallrj unpinned this issue Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant