Use controller-runtime to consume Machine API #89

Bobbins228 · 2023-06-22T10:42:27Z

Closes: #53

Verification Steps

OSD Cluster - Machine Pools

Scale machines using InstaScale
Delete the AppWrapper to trigger the scaling down steps
InstaScale should work as expected

Self Managed/OCP Cluster - MachineSets

Follow this guide for creating MachineSet Templates
Scale Machines using InstaScale
Delete the AppWrapper to trigger the scaling down steps
InstaScale should work as expected

Note: At the moment when scaling up and down errors about updating the appwrapper will appear in the console due to an issue with applying/removing the finalizer

controllers/appwrapper_controller.go

asm582 · 2023-07-05T13:04:28Z

are we planning to migrate to controller runtime?

astefanutti · 2023-07-05T13:26:03Z

are we planning to migrate to controller runtime?

Yes, @Bobbins228 and I are working on it.

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machineset.go

controllers/appwrapper_controller.go

asm582 · 2023-07-24T13:47:01Z

@Bobbins228 Thanks for this PR, did we do a manual testing of this PR? is scale-up and scale-down happening as expected?

Bobbins228 · 2023-07-24T13:53:20Z

@Bobbins228 Thanks for this PR, did we do a manual testing of this PR? is scale-up and scale-down happening as expected?

I tested this on an OSD cluster and I was able to successfully scale up and down but I did receive these warning logs which didn't appear before before my changes.

1.690196645276774e+09	INFO	KubeAPIWarningLogger	unknown field "spec.resources.GenericItems[0].metadata.creationTimestamp"
1.6901966452768493e+09	INFO	KubeAPIWarningLogger	unknown field "spec.resources.GenericItems[1].metadata.creationTimestamp"
1.6901966452768636e+09	INFO	KubeAPIWarningLogger	unknown field "spec.schedulingSpec.clusterScheduling"
1.6901966452768729e+09	INFO	KubeAPIWarningLogger	unknown field "spec.schedulingSpec.dispatchingWindow"
1.6901966452768831e+09	INFO	KubeAPIWarningLogger	unknown field "status.controllerfirstdispatchtimestamp"

anishasthana · 2023-07-24T17:07:01Z

Is that related to the new CRD stuff?

Bobbins228 · 2023-07-24T17:28:22Z

Is that related to the new CRD stuff?

I am not sure but I think I recall seeing the same errors in MCAD too.

astefanutti · 2023-07-24T18:38:44Z

Is that related to the new CRD stuff?

No, it's an existing issue, that'll be fixed with project-codeflare/multi-cluster-app-dispatcher#456.

controllers/appwrapper_controller.go

astefanutti · 2023-07-27T10:07:22Z

controllers/appwrapper_controller.go

+	if appwrapper.ObjectMeta.DeletionTimestamp.IsZero() {
+		if !controllerutil.ContainsFinalizer(&appwrapper, finalizerName) {
+			//onAdd replacement
+			if appwrapper.Status.State == arbv1.AppWrapperStateEnqueued || appwrapper.Status.State == "" {


Here, the logic of scaling up is guarded by the finalizer, but there is still a possibility that the update to add the finalizer fails, which can lead to duplicated scaling. There is the scaledAppwrapper global variable, that holds an in-memory state of the scaled AppWrappers, but it's also fragile, as it's not persisted, for example in case the controller restarted.

I'd suggest to decouple the finalizer logic from the scaling invariant guard, and implement the later by relying on the persisted state in the cluster, e.g., by checking the existence of MachinePool / MachineSet with the corresponding AppWrapper label(s), that were atomically added during the creation of these resources.

A corollary would be to have the scaledAppwrapper global variable removed altogether.

controllers/machineset.go

astefanutti · 2023-07-27T10:10:25Z

controllers/machineset.go

@@ -109,11 +114,12 @@ func scaleMachineSet(aw *arbv1.AppWrapper, userRequestedInstanceType string, rep
 					//TODO: user can delete appwrapper work on triggering scale-down
 					klog.Infof("waiting for machines to be in state Ready. replicas needed: %v and replicas available: %v", replicas, ms.Status.AvailableReplicas)
 					time.Sleep(1 * time.Minute)


The reconcile loop should not be blocked. Instead the reconcile request should be re-queued, so the state of the MachineSet is checked.

controllers/machineset.go

anishasthana · 2023-08-03T20:15:18Z

/retest

astefanutti · 2023-09-12T09:23:29Z

controllers/appwrapper_controller.go

+	// Only reason we are calling it here is that the client is not able to make
+	// calls until it is started, so SetupWithManager is not working.
+	if !useMachineSets && ocmClusterID == "" {
+		getOCMClusterID(r)


I'd removed the call here, and get the cluster ID lazily, by calling the function where it's needed, and gate the retrieval of the ID with sync.Once.

I tried to implement this by calling it within the machinePoolExists() function in SetupWithManager() but this didn't work because the client can't be used until it is started.

I do notice that we check if the ocm secret exists before we call machinePoolExists(). As I understand it we only need that ocm secret if we are scaling machine pools. If this is the case I can use that as the condition that sets useMachinePools to true and apply your suggestion in the other machinePool functions.

So the call from the SetupWithManager method fails because getOCMClusterID uses the controller-runtime cached client to query the ClusterVersion API. This is not recommended as it'll keep watching the API for no reason, and should be changed to using the plain client-go / openshift client. Then it'll be possible to move the call to getOCMClusterID in SetupWithManager.

controllers/appwrapper_controller.go

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machinepools.go

controllers/appwrapper_controller.go

pkg/config/config.go

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machineset.go

controllers/appwrapper_controller.go

controllers/machinepools.go

dismissing my old review

astefanutti · 2023-10-25T09:35:31Z

/lgtm

astefanutti · 2023-10-25T09:35:45Z

/approve

openshift-ci · 2023-10-25T09:35:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [astefanutti]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Bobbins228 requested review from astefanutti, anishasthana and dimakis June 22, 2023 10:43

astefanutti reviewed Jun 22, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

astefanutti suggested changes Jul 6, 2023

View reviewed changes

anishasthana previously requested changes Jul 6, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

astefanutti suggested changes Jul 6, 2023

View reviewed changes

controllers/machineset.go Outdated Show resolved Hide resolved

controllers/machineset.go Outdated Show resolved Hide resolved

astefanutti suggested changes Jul 7, 2023

View reviewed changes

Bobbins228 changed the title ~~Moved unnecessarily initialized variables in the Reconcile loop~~ Use controller-runtime to consume Machine API Jul 7, 2023

VanillaSpoon reviewed Jul 20, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

Bobbins228 force-pushed the replace-unnecessary-variables branch 2 times, most recently from 8d6d112 to 2a8ebf5 Compare July 24, 2023 13:08

Bobbins228 mentioned this pull request Jul 25, 2023

Stop initializing variables in reconcile loop #53

Closed

astefanutti suggested changes Jul 27, 2023

View reviewed changes

openshift-ci bot assigned astefanutti Jul 27, 2023

openshift-merge-robot added the needs-rebase label Aug 10, 2023

astefanutti mentioned this pull request Sep 4, 2023

Use structured configuration to configure MCAD and InstaScale project-codeflare/codeflare-operator#273

Closed

openshift-merge-robot removed the needs-rebase label Sep 11, 2023

astefanutti suggested changes Sep 12, 2023

View reviewed changes

Bobbins228 changed the title ~~Use controller-runtime to consume Machine API~~ [WIP] Use controller-runtime to consume Machine API Sep 15, 2023

openshift-ci bot added the do-not-merge/work-in-progress label Sep 15, 2023

Bobbins228 force-pushed the replace-unnecessary-variables branch from 0b94968 to e310af4 Compare September 18, 2023 16:22

Bobbins228 force-pushed the replace-unnecessary-variables branch from e310af4 to b03cad3 Compare September 21, 2023 14:40

Fiona-Waters mentioned this pull request Sep 26, 2023

Adding instascale e2e test project-codeflare/codeflare-operator#271

Merged

4 tasks

astefanutti mentioned this pull request Sep 26, 2023

Add missing OpenShift Config API to runtime scheme project-codeflare/codeflare-operator#309

Merged

Bobbins228 changed the title ~~[WIP] Use controller-runtime to consume Machine API~~ Use controller-runtime to consume Machine API Sep 27, 2023

openshift-ci bot removed the do-not-merge/work-in-progress label Sep 27, 2023

astefanutti suggested changes Sep 27, 2023

View reviewed changes

ChristianZaccaria reviewed Sep 27, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

ChristianZaccaria suggested changes Sep 27, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/machinepools.go Outdated Show resolved Hide resolved

openshift-ci bot assigned ChristianZaccaria Sep 27, 2023

astefanutti suggested changes Sep 28, 2023

View reviewed changes

Fiona-Waters mentioned this pull request Oct 11, 2023

Test Instascale machine set functionality project-codeflare/codeflare-operator#322

Merged

4 tasks

VanillaSpoon reviewed Oct 11, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

controllers/machinepools.go Outdated Show resolved Hide resolved

controllers/machinepools.go Show resolved Hide resolved

Bobbins228 force-pushed the replace-unnecessary-variables branch from 854ca8d to 8743b4e Compare October 19, 2023 09:34

Bobbins228 added 7 commits October 24, 2023 12:25

Moved unnecessarily initialized variables in the Reconcile loop

5c42728

Removed pending functions for AppWrappers

ea599ae

Review changes: returning erors & updated listing

7157f44

Added check for Completed AppWrappers

25138bc

Review changes, removed useMachineSets and added reuse to config

61e064c

Review changes, improved annotateToDeleteMachine function

1b6b5af

Review changes, loop issues

450afa1

Bobbins228 force-pushed the replace-unnecessary-variables branch from 8743b4e to 450afa1 Compare October 24, 2023 11:27

openshift-ci bot added the lgtm label Oct 25, 2023

openshift-ci bot added the approved label Oct 25, 2023

openshift-ci bot merged commit e591d18 into project-codeflare:main Oct 25, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use controller-runtime to consume Machine API #89

Use controller-runtime to consume Machine API #89

Bobbins228 commented Jun 22, 2023 •

edited

Loading

asm582 commented Jul 5, 2023

astefanutti commented Jul 5, 2023

asm582 commented Jul 24, 2023

Bobbins228 commented Jul 24, 2023

anishasthana commented Jul 24, 2023

Bobbins228 commented Jul 24, 2023

astefanutti commented Jul 24, 2023

astefanutti Jul 27, 2023 •

edited

Loading

astefanutti Jul 27, 2023

anishasthana commented Aug 3, 2023

astefanutti Sep 12, 2023

Bobbins228 Sep 12, 2023

astefanutti Sep 28, 2023

astefanutti commented Oct 25, 2023

astefanutti commented Oct 25, 2023

openshift-ci bot commented Oct 25, 2023

Use controller-runtime to consume Machine API #89

Use controller-runtime to consume Machine API #89

Conversation

Bobbins228 commented Jun 22, 2023 • edited Loading

Verification Steps

OSD Cluster - Machine Pools

Self Managed/OCP Cluster - MachineSets

asm582 commented Jul 5, 2023

astefanutti commented Jul 5, 2023

asm582 commented Jul 24, 2023

Bobbins228 commented Jul 24, 2023

anishasthana commented Jul 24, 2023

Bobbins228 commented Jul 24, 2023

astefanutti commented Jul 24, 2023

astefanutti Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

astefanutti Jul 27, 2023

Choose a reason for hiding this comment

anishasthana commented Aug 3, 2023

astefanutti Sep 12, 2023

Choose a reason for hiding this comment

Bobbins228 Sep 12, 2023

Choose a reason for hiding this comment

astefanutti Sep 28, 2023

Choose a reason for hiding this comment

astefanutti commented Oct 25, 2023

astefanutti commented Oct 25, 2023

openshift-ci bot commented Oct 25, 2023

Bobbins228 commented Jun 22, 2023 •

edited

Loading

astefanutti Jul 27, 2023 •

edited

Loading