add: integration with hypershift #129

VanillaSpoon · 2023-08-28T09:23:53Z

Issue link

fixes #80
closes #175

This is a pr to allow for instascale integration with Hypershift.

What changes have been made

Addition of OCM manager, this is to support the use of ocmConnections for both OSD and Hypershift.
Addition of Nodepool scaling logic.
General Refactor.

The pr also contains changes to remove irrelevant logging & repeated loop logging

Verification steps

Create a hypershift cluster.
Deploy the stack.
Create an appwrapper instascale enabled, and ensure the nodepool successfully scales up and down.

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- Testing is not required for this change

controllers/appwrapper_controller.go

Bobbins228

Tested this out on my HyperShift cluster and I have gotten this error when following the InstaScale guided demo

I1108 14:05:43.091423 1 nodepools.go:46] The instanceRequired array: m5.xlarge
I1108 14:05:43.091445 1 nodepools.go:54] Built NodePool with instance type m5.xlarge and name instascaletest-m5-xlarge
E1108 14:05:43.096248 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'dac380af-7a33-4b1a-a6aa-dcb353237a44': NodePool name 'instascaletest-m5-xlarge' is 24 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.096274 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[dac380af-7a33-4b1a-a6aa-dcb353237a44]] 0xc00068d8f0 <nil>}
I1108 14:05:43.170794 1 nodepools.go:46] The instanceRequired array: g4dn.xlarge
I1108 14:05:43.170815 1 nodepools.go:54] Built NodePool with instance type g4dn.xlarge and name instascaletest-g4dn-xlarge
E1108 14:05:43.175629 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'd573d9d0-598b-494a-bc9c-cc7a5ba2c1dd': NodePool name 'instascaletest-g4dn-xlarge' is 26 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.175646 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[d573d9d0-598b-494a-bc9c-cc7a5ba2c1dd]] 0xc00068da40 <nil>}

Despite the odd error about the name being too long it says it created the nodepool.
I checked out the Nodes in the compute section and no new nodes were created.

VanillaSpoon · 2023-11-08T14:30:59Z

Tested this out on my HyperShift cluster and I have gotten this error when following the InstaScale guided demo

I1108 14:05:43.091423 1 nodepools.go:46] The instanceRequired array: m5.xlarge
I1108 14:05:43.091445 1 nodepools.go:54] Built NodePool with instance type m5.xlarge and name instascaletest-m5-xlarge
E1108 14:05:43.096248 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'dac380af-7a33-4b1a-a6aa-dcb353237a44': NodePool name 'instascaletest-m5-xlarge' is 24 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.096274 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[dac380af-7a33-4b1a-a6aa-dcb353237a44]] 0xc00068d8f0 <nil>}
I1108 14:05:43.170794 1 nodepools.go:46] The instanceRequired array: g4dn.xlarge
I1108 14:05:43.170815 1 nodepools.go:54] Built NodePool with instance type g4dn.xlarge and name instascaletest-g4dn-xlarge
E1108 14:05:43.175629 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'd573d9d0-598b-494a-bc9c-cc7a5ba2c1dd': NodePool name 'instascaletest-g4dn-xlarge' is 26 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.175646 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[d573d9d0-598b-494a-bc9c-cc7a5ba2c1dd]] 0xc00068da40 <nil>}

Despite the odd error about the name being too long it says it created the nodepool. I checked out the Nodes in the compute section and no new nodes were created.

Hey @Bobbins228 , this is something I've raised an issue for, the nodepool name character limit is 15, so after appending the node type the name quickly surpasses the limit. This is also the case for machinepools, with a character limit of 30.

I'm hoping to address this soon, however, for now you're right, it should at least discontinue the function to build the nodepool when the error is returned.

Specifically with nodepools, after appending the additional information to the name, even a 4 character name can exceed the limit. so this may require a new naming convention

Bobbins228

@VanillaSpoon Tested this again with a shorter AppWrapper name < 4 characters and I was able to scale up and down node pools with no issues.

/lgtm

We should start a discussion in the InstaScale channel on the best way to go about the naming convention for nodepools 👍

Fiona-Waters

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :)
Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.

VanillaSpoon · 2023-11-20T10:15:43Z

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.

Hey @Fiona-Waters,
Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?

Fiona-Waters · 2023-11-20T14:39:28Z

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.

Hey @Fiona-Waters, Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?

Maybe we could check the status of the node pool and if it is "deleting" then print the log line. I guess we would have to do this for machine pools and machine sets too. Seems like a bit of work to remove some extra log lines so of course up to you how to proceed. I'm not sure about adding a field to the appwrapper, would this only update after the node pool has completed deleted and then may not remove any log lines?

controllers/appwrapper_controller.go

Fiona-Waters · 2023-12-11T15:05:55Z

Re-ran on a hypershift cluster, can confirm that it works as expected. Great work Eoin!
/lgtm

controllers/appwrapper_controller.go

VanillaSpoon · 2023-12-13T16:39:13Z

~~Moving this to WIP temporarily as nodes appear to be re-scaling after deletion in Hypershift.~~

controllers/nodepools.go

sutaakar

/lgtm

Bobbins228

Working as intended 👍
/lgtm

openshift-ci · 2024-01-22T17:29:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anishasthana

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [anishasthana]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from astefanutti and dimakis August 28, 2023 09:23

VanillaSpoon force-pushed the IntegrateHypershift branch from 62a3a22 to 155b88a Compare August 28, 2023 10:37

openshift-merge-robot added the needs-rebase label Sep 6, 2023

VanillaSpoon added the do-not-merge/work-in-progress label Sep 18, 2023

Fiona-Waters mentioned this pull request Sep 28, 2023

Create e2e tests for InstaScale #66

Closed

Fiona-Waters mentioned this pull request Oct 9, 2023

Instascale E2E - Node Pools project-codeflare/codeflare-operator#324

Open

VanillaSpoon force-pushed the IntegrateHypershift branch from 155b88a to fbfd9f6 Compare November 2, 2023 17:06

openshift-merge-robot removed the needs-rebase label Nov 2, 2023

VanillaSpoon force-pushed the IntegrateHypershift branch from fbfd9f6 to 5df7150 Compare November 2, 2023 17:09

openshift-ci bot removed the do-not-merge/work-in-progress label Nov 2, 2023

VanillaSpoon requested a review from asm582 November 6, 2023 16:44

Fiona-Waters reviewed Nov 8, 2023

View reviewed changes

controllers/appwrapper_controller.go Show resolved Hide resolved

Bobbins228 suggested changes Nov 8, 2023

View reviewed changes

openshift-ci bot assigned Bobbins228 Nov 8, 2023

Bobbins228 reviewed Nov 8, 2023

View reviewed changes

openshift-ci bot added the lgtm label Nov 8, 2023

Fiona-Waters reviewed Nov 10, 2023

View reviewed changes

VanillaSpoon mentioned this pull request Nov 20, 2023

Machines not scaling due to ID character limit #182

Merged

4 tasks

VanillaSpoon force-pushed the IntegrateHypershift branch from 5df7150 to 3ca7587 Compare November 24, 2023 15:10

openshift-ci bot removed the lgtm label Nov 24, 2023

VanillaSpoon requested a review from Fiona-Waters November 24, 2023 15:13

Fiona-Waters reviewed Nov 24, 2023

View reviewed changes

controllers/appwrapper_controller.go Outdated Show resolved Hide resolved

openshift-ci bot assigned Fiona-Waters Nov 27, 2023

openshift-ci bot added lgtm and removed lgtm labels Nov 27, 2023

VanillaSpoon force-pushed the IntegrateHypershift branch from a0bcc51 to f18cbf1 Compare November 28, 2023 10:07

VanillaSpoon force-pushed the IntegrateHypershift branch from d2efee1 to 27aef2e Compare December 11, 2023 14:46

openshift-ci bot added the lgtm label Dec 11, 2023

VanillaSpoon force-pushed the IntegrateHypershift branch from 27aef2e to 0d82c53 Compare December 13, 2023 09:52

openshift-ci bot removed the lgtm label Dec 13, 2023

Bobbins228 reviewed Dec 13, 2023

View reviewed changes

controllers/appwrapper_controller.go Show resolved Hide resolved

VanillaSpoon marked this pull request as draft December 13, 2023 16:39

openshift-ci bot added the do-not-merge/work-in-progress label Dec 13, 2023

VanillaSpoon marked this pull request as ready for review December 19, 2023 13:02

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 19, 2023

VanillaSpoon marked this pull request as draft December 21, 2023 13:34

openshift-ci bot added the do-not-merge/work-in-progress label Dec 21, 2023

add: Hypershift Functionality

c7c2f14

VanillaSpoon force-pushed the IntegrateHypershift branch from 0d82c53 to c7c2f14 Compare January 4, 2024 13:40

VanillaSpoon marked this pull request as ready for review January 4, 2024 13:43

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 4, 2024

openshift-ci bot requested a review from sutaakar January 4, 2024 13:43

VanillaSpoon requested a review from Bobbins228 January 4, 2024 13:49

sutaakar reviewed Jan 8, 2024

View reviewed changes

controllers/nodepools.go Show resolved Hide resolved

add: license headers to controllers package.

76dd2e2

sutaakar reviewed Jan 8, 2024

View reviewed changes

openshift-ci bot assigned sutaakar Jan 8, 2024

openshift-ci bot added the lgtm label Jan 8, 2024

Bobbins228 reviewed Jan 10, 2024

View reviewed changes

anishasthana approved these changes Jan 22, 2024

View reviewed changes

openshift-ci bot assigned anishasthana Jan 22, 2024

openshift-ci bot added the approved label Jan 22, 2024

openshift-merge-bot bot merged commit 2798824 into project-codeflare:main Jan 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add: integration with hypershift #129

add: integration with hypershift #129

VanillaSpoon commented Aug 28, 2023 •

edited

Loading

Bobbins228 left a comment

VanillaSpoon commented Nov 8, 2023

Bobbins228 left a comment

Fiona-Waters left a comment

VanillaSpoon commented Nov 20, 2023

Fiona-Waters commented Nov 20, 2023

Fiona-Waters commented Dec 11, 2023

VanillaSpoon commented Dec 13, 2023 •

edited

Loading

sutaakar left a comment

Bobbins228 left a comment

openshift-ci bot commented Jan 22, 2024

add: integration with hypershift #129

add: integration with hypershift #129

Conversation

VanillaSpoon commented Aug 28, 2023 • edited Loading

Issue link

What changes have been made

Verification steps

Checks

Bobbins228 left a comment

Choose a reason for hiding this comment

VanillaSpoon commented Nov 8, 2023

Bobbins228 left a comment

Choose a reason for hiding this comment

Fiona-Waters left a comment

Choose a reason for hiding this comment

VanillaSpoon commented Nov 20, 2023

Fiona-Waters commented Nov 20, 2023

Fiona-Waters commented Dec 11, 2023

VanillaSpoon commented Dec 13, 2023 • edited Loading

sutaakar left a comment

Choose a reason for hiding this comment

Bobbins228 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 22, 2024

VanillaSpoon commented Aug 28, 2023 •

edited

Loading

VanillaSpoon commented Dec 13, 2023 •

edited

Loading