Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: integration with hypershift #129

Merged

Conversation

VanillaSpoon
Copy link
Contributor

@VanillaSpoon VanillaSpoon commented Aug 28, 2023

Issue link

fixes #80
closes #175

This is a pr to allow for instascale integration with Hypershift.

What changes have been made

  • Addition of OCM manager, this is to support the use of ocmConnections for both OSD and Hypershift.
  • Addition of Nodepool scaling logic.
  • General Refactor.

The pr also contains changes to remove irrelevant logging & repeated loop logging

Verification steps

Create a hypershift cluster.
Deploy the stack.
Create an appwrapper instascale enabled, and ensure the nodepool successfully scales up and down.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this out on my HyperShift cluster and I have gotten this error when following the InstaScale guided demo

I1108 14:05:43.091423 1 nodepools.go:46] The instanceRequired array: m5.xlarge
I1108 14:05:43.091445 1 nodepools.go:54] Built NodePool with instance type m5.xlarge and name instascaletest-m5-xlarge
E1108 14:05:43.096248 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'dac380af-7a33-4b1a-a6aa-dcb353237a44': NodePool name 'instascaletest-m5-xlarge' is 24 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.096274 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[dac380af-7a33-4b1a-a6aa-dcb353237a44]] 0xc00068d8f0 <nil>}
I1108 14:05:43.170794 1 nodepools.go:46] The instanceRequired array: g4dn.xlarge
I1108 14:05:43.170815 1 nodepools.go:54] Built NodePool with instance type g4dn.xlarge and name instascaletest-g4dn-xlarge
E1108 14:05:43.175629 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'd573d9d0-598b-494a-bc9c-cc7a5ba2c1dd': NodePool name 'instascaletest-g4dn-xlarge' is 26 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.175646 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[d573d9d0-598b-494a-bc9c-cc7a5ba2c1dd]] 0xc00068da40 <nil>}

Despite the odd error about the name being too long it says it created the nodepool.
I checked out the Nodes in the compute section and no new nodes were created.

@VanillaSpoon
Copy link
Contributor Author

Tested this out on my HyperShift cluster and I have gotten this error when following the InstaScale guided demo

I1108 14:05:43.091423 1 nodepools.go:46] The instanceRequired array: m5.xlarge
I1108 14:05:43.091445 1 nodepools.go:54] Built NodePool with instance type m5.xlarge and name instascaletest-m5-xlarge
E1108 14:05:43.096248 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'dac380af-7a33-4b1a-a6aa-dcb353237a44': NodePool name 'instascaletest-m5-xlarge' is 24 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.096274 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[dac380af-7a33-4b1a-a6aa-dcb353237a44]] 0xc00068d8f0 <nil>}
I1108 14:05:43.170794 1 nodepools.go:46] The instanceRequired array: g4dn.xlarge
I1108 14:05:43.170815 1 nodepools.go:54] Built NodePool with instance type g4dn.xlarge and name instascaletest-g4dn-xlarge
E1108 14:05:43.175629 1 nodepools.go:57] Error creating NodePool: status is 400, identifier is '400', code is 'CLUSTERS-MGMT-400' and operation identifier is 'd573d9d0-598b-494a-bc9c-cc7a5ba2c1dd': NodePool name 'instascaletest-g4dn-xlarge' is 26 characters long - its length exceeds the maximum length allowed of 15 characters
I1108 14:05:43.175646 1 nodepools.go:59] Created NodePool: &{400 map[Content-Type:[application/json] Date:[Wed, 08 Nov 2023 14:05:43 GMT] Server:[envoy] Vary:[Accept-Encoding] X-Envoy-Upstream-Service-Time:[2] X-Operation-Id:[d573d9d0-598b-494a-bc9c-cc7a5ba2c1dd]] 0xc00068da40 <nil>}

Despite the odd error about the name being too long it says it created the nodepool. I checked out the Nodes in the compute section and no new nodes were created.

Hey @Bobbins228 , this is something I've raised an issue for, the nodepool name character limit is 15, so after appending the node type the name quickly surpasses the limit. This is also the case for machinepools, with a character limit of 30.

I'm hoping to address this soon, however, for now you're right, it should at least discontinue the function to build the nodepool when the error is returned.

Specifically with nodepools, after appending the additional information to the name, even a 4 character name can exceed the limit. so this may require a new naming convention

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VanillaSpoon Tested this again with a shorter AppWrapper name < 4 characters and I was able to scale up and down node pools with no issues.

/lgtm

We should start a discussion in the InstaScale channel on the best way to go about the naming convention for nodepools 👍

@openshift-ci openshift-ci bot added the lgtm label Nov 8, 2023
Copy link
Contributor

@Fiona-Waters Fiona-Waters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :)
Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function.
image

@VanillaSpoon
Copy link
Contributor Author

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function. image

Hey @Fiona-Waters,
Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?

@Fiona-Waters
Copy link
Contributor

Ran through the nodepools e2e test using this PR on a Hypershift Cluster. Functionality works as expected. This is great!! :) Only comment is that the logs are a bit crowded from the finalizeScalingDownMachines func being called in the reconcile function. image

Hey @Fiona-Waters, Thanks for pointing this out. I have replicated the issue here too. You are correct, this is due to finalizeScalingDownMachines func being called in reconcile. Would it be appropriate to add `hasCompletedScaleDown' or an equivalent as a field in the appwrapper?

Maybe we could check the status of the node pool and if it is "deleting" then print the log line. I guess we would have to do this for machine pools and machine sets too. Seems like a bit of work to remove some extra log lines so of course up to you how to proceed. I'm not sure about adding a field to the appwrapper, would this only update after the node pool has completed deleted and then may not remove any log lines?

@Fiona-Waters
Copy link
Contributor

Re-ran on a hypershift cluster, can confirm that it works as expected. Great work Eoin!
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 11, 2023
@openshift-ci openshift-ci bot removed the lgtm label Dec 13, 2023
@VanillaSpoon
Copy link
Contributor Author

VanillaSpoon commented Dec 13, 2023

Moving this to WIP temporarily as nodes appear to be re-scaling after deletion in Hypershift.

@VanillaSpoon VanillaSpoon marked this pull request as draft December 13, 2023 16:39
@VanillaSpoon VanillaSpoon marked this pull request as ready for review December 19, 2023 13:02
@VanillaSpoon VanillaSpoon marked this pull request as draft December 21, 2023 13:34
@VanillaSpoon VanillaSpoon marked this pull request as ready for review January 4, 2024 13:43
@openshift-ci openshift-ci bot requested a review from sutaakar January 4, 2024 13:43
Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working as intended 👍
/lgtm

Copy link

openshift-ci bot commented Jan 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anishasthana

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 2798824 into project-codeflare:main Jan 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants