Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use controller-runtime to consume Machine API #89

Merged

Conversation

Bobbins228
Copy link
Contributor

@Bobbins228 Bobbins228 commented Jun 22, 2023

Closes: #53

Verification Steps

OSD Cluster - Machine Pools

  • Scale machines using InstaScale
  • Delete the AppWrapper to trigger the scaling down steps
    InstaScale should work as expected

Self Managed/OCP Cluster - MachineSets

  • Follow this guide for creating MachineSet Templates
  • Scale Machines using InstaScale
  • Delete the AppWrapper to trigger the scaling down steps
    InstaScale should work as expected

Note: At the moment when scaling up and down errors about updating the appwrapper will appear in the console due to an issue with applying/removing the finalizer

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
@asm582
Copy link
Member

asm582 commented Jul 5, 2023

are we planning to migrate to controller runtime?

@astefanutti
Copy link
Contributor

are we planning to migrate to controller runtime?

Yes, @Bobbins228 and I are working on it.

controllers/machineset.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
@Bobbins228 Bobbins228 changed the title Moved unnecessarily initialized variables in the Reconcile loop Use controller-runtime to consume Machine API Jul 7, 2023
@Bobbins228 Bobbins228 force-pushed the replace-unnecessary-variables branch 2 times, most recently from 8d6d112 to 2a8ebf5 Compare July 24, 2023 13:08
@asm582
Copy link
Member

asm582 commented Jul 24, 2023

@Bobbins228 Thanks for this PR, did we do a manual testing of this PR? is scale-up and scale-down happening as expected?

@Bobbins228
Copy link
Contributor Author

@Bobbins228 Thanks for this PR, did we do a manual testing of this PR? is scale-up and scale-down happening as expected?

I tested this on an OSD cluster and I was able to successfully scale up and down but I did receive these warning logs which didn't appear before before my changes.

1.690196645276774e+09	INFO	KubeAPIWarningLogger	unknown field "spec.resources.GenericItems[0].metadata.creationTimestamp"
1.6901966452768493e+09	INFO	KubeAPIWarningLogger	unknown field "spec.resources.GenericItems[1].metadata.creationTimestamp"
1.6901966452768636e+09	INFO	KubeAPIWarningLogger	unknown field "spec.schedulingSpec.clusterScheduling"
1.6901966452768729e+09	INFO	KubeAPIWarningLogger	unknown field "spec.schedulingSpec.dispatchingWindow"
1.6901966452768831e+09	INFO	KubeAPIWarningLogger	unknown field "status.controllerfirstdispatchtimestamp"

@anishasthana
Copy link
Member

Is that related to the new CRD stuff?

@Bobbins228
Copy link
Contributor Author

Is that related to the new CRD stuff?

I am not sure but I think I recall seeing the same errors in MCAD too.

@astefanutti
Copy link
Contributor

Is that related to the new CRD stuff?

No, it's an existing issue, that'll be fixed with project-codeflare/multi-cluster-app-dispatcher#456.

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
if appwrapper.ObjectMeta.DeletionTimestamp.IsZero() {
if !controllerutil.ContainsFinalizer(&appwrapper, finalizerName) {
//onAdd replacement
if appwrapper.Status.State == arbv1.AppWrapperStateEnqueued || appwrapper.Status.State == "" {
Copy link
Contributor

@astefanutti astefanutti Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the logic of scaling up is guarded by the finalizer, but there is still a possibility that the update to add the finalizer fails, which can lead to duplicated scaling. There is the scaledAppwrapper global variable, that holds an in-memory state of the scaled AppWrappers, but it's also fragile, as it's not persisted, for example in case the controller restarted.

I'd suggest to decouple the finalizer logic from the scaling invariant guard, and implement the later by relying on the persisted state in the cluster, e.g., by checking the existence of MachinePool / MachineSet with the corresponding AppWrapper label(s), that were atomically added during the creation of these resources.

A corollary would be to have the scaledAppwrapper global variable removed altogether.

controllers/machineset.go Outdated Show resolved Hide resolved
@@ -109,11 +114,12 @@ func scaleMachineSet(aw *arbv1.AppWrapper, userRequestedInstanceType string, rep
//TODO: user can delete appwrapper work on triggering scale-down
klog.Infof("waiting for machines to be in state Ready. replicas needed: %v and replicas available: %v", replicas, ms.Status.AvailableReplicas)
time.Sleep(1 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconcile loop should not be blocked. Instead the reconcile request should be re-queued, so the state of the MachineSet is checked.

controllers/machineset.go Outdated Show resolved Hide resolved
@anishasthana
Copy link
Member

/retest

// Only reason we are calling it here is that the client is not able to make
// calls until it is started, so SetupWithManager is not working.
if !useMachineSets && ocmClusterID == "" {
getOCMClusterID(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd removed the call here, and get the cluster ID lazily, by calling the function where it's needed, and gate the retrieval of the ID with sync.Once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this by calling it within the machinePoolExists() function in SetupWithManager() but this didn't work because the client can't be used until it is started.

I do notice that we check if the ocm secret exists before we call machinePoolExists(). As I understand it we only need that ocm secret if we are scaling machine pools. If this is the case I can use that as the condition that sets useMachinePools to true and apply your suggestion in the other machinePool functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the call from the SetupWithManager method fails because getOCMClusterID uses the controller-runtime cached client to query the ClusterVersion API. This is not recommended as it'll keep watching the API for no reason, and should be changed to using the plain client-go / openshift client. Then it'll be possible to move the call to getOCMClusterID in SetupWithManager.

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machineset.go Show resolved Hide resolved
controllers/machineset.go Show resolved Hide resolved
controllers/machineset.go Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
@Bobbins228 Bobbins228 changed the title Use controller-runtime to consume Machine API [WIP] Use controller-runtime to consume Machine API Sep 15, 2023
@Bobbins228 Bobbins228 force-pushed the replace-unnecessary-variables branch from 0b94968 to e310af4 Compare September 18, 2023 16:22
@Bobbins228 Bobbins228 force-pushed the replace-unnecessary-variables branch from e310af4 to b03cad3 Compare September 21, 2023 14:40
@Bobbins228 Bobbins228 changed the title [WIP] Use controller-runtime to consume Machine API Use controller-runtime to consume Machine API Sep 27, 2023
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machinepools.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
pkg/config/config.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/machineset.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/appwrapper_controller.go Outdated Show resolved Hide resolved
controllers/machinepools.go Outdated Show resolved Hide resolved
controllers/machinepools.go Show resolved Hide resolved
@Bobbins228 Bobbins228 force-pushed the replace-unnecessary-variables branch from 854ca8d to 8743b4e Compare October 19, 2023 09:34
@Bobbins228 Bobbins228 force-pushed the replace-unnecessary-variables branch from 8743b4e to 450afa1 Compare October 24, 2023 11:27
@anishasthana anishasthana dismissed their stale review October 24, 2023 13:29

dismissing my old review

@astefanutti
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Oct 25, 2023
@astefanutti
Copy link
Contributor

/approve

@openshift-ci
Copy link

openshift-ci bot commented Oct 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit e591d18 into project-codeflare:main Oct 25, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop initializing variables in reconcile loop
7 participants