Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent infinite CAPIProvider reconciliations #908

Merged

Conversation

anmazzotti
Copy link
Contributor

What this PR does / why we need it:

The CAPIProvider syncing secret is getting a new resourceVersion even on empty patch. This creates an infinite loop where the CAPIProviderReconciler is continuously invoked due to the secret version change.

This PR tries to address this, however further issues escalate after applying it.
For example there is a conflict between the rancher-turtles-controller-manager and the cluster-api-operator when it comes to managing some resources.
For example giving the following bootstrap, there is a conflict in how finalizers are handled:

kind: BootstrapProvider
metadata:
  annotations:
    operator.cluster.x-k8s.io/applied-spec-hash: d637e476cb631c683729765c7f6f3fd8c04b5c314254b114aa9500b51b7ed8f1 --> Emptied/Re-filled on a race condition
  creationTimestamp: "2024-12-04T14:21:54Z" 
  finalizers:
  - foregroundDeletion
  - provider.cluster.x-k8s.io --> Deleted/Re-added on a race condition
  generation: 1
  name: rke2
  namespace: default
  ownerReferences:
  - apiVersion: turtles-capi.cattle.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: CAPIProvider
    name: rke2-bootstrap
    uid: bc2f1c0c-9f32-4bab-9859-e18098e9f82d
  resourceVersion: "246785"
  uid: 58a94292-7858-4449-afa1-82edfdbf701c
spec:
  configSecret:
    name: rke2-bootstrap
    namespace: default
  version: v0.9.0
status:
  conditions:
  - lastTransitionTime: "2024-12-04T14:22:38Z"
    reason: MinimumReplicasAvailable
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-12-04T14:22:28Z"
    status: "True"
    type: PreflightCheckPassed
  - lastTransitionTime: "2024-12-04T14:22:27Z"
    status: "True"
    type: ProviderInstalled
  contract: v1beta1
  installedVersion: v0.9.0
  observedGeneration: 1

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #892

Special notes for your reviewer:

Checklist:

  • squashed commits into logical changes
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

@anmazzotti anmazzotti requested a review from a team as a code owner December 5, 2024 13:40
@anmazzotti anmazzotti marked this pull request as draft December 5, 2024 13:40
@anmazzotti anmazzotti changed the title Do not create new resourceVersion on empty patches Prevent infinite CAPIProvider reconciliations Dec 6, 2024
@anmazzotti anmazzotti marked this pull request as ready for review December 6, 2024 15:29
@kkaempf kkaempf added the kind/bug Something isn't working label Dec 6, 2024
@kkaempf kkaempf added this to the December release milestone Dec 6, 2024
@anmazzotti anmazzotti force-pushed the fix_infinite_provider_reconcile branch from 818568c to cab64f2 Compare December 9, 2024 17:03
Danil-Grigorev
Danil-Grigorev previously approved these changes Dec 10, 2024
Copy link
Contributor

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no regression in tests which verify RV stability on the secret resource, and SSA patch is notoriously unpredictable. I can also see one of the flakes in

g.Expect(conditions.Get(capiProvider, turtlesv1.CheckLatestVersionTime).Message).To(Equal(fmt.Sprintf("Updated to latest %s version", CAPIVersion)))
no longer appearing in local runs.

LGTM

Copy link
Member

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@alexander-demicev alexander-demicev merged commit 7f191fa into rancher:main Dec 13, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Development

Successfully merging this pull request may close these issues.

Unit tests fail with a flaky error
4 participants