Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting options per worker group #240

Closed
9 tasks done
himanshu-kun opened this issue Jun 28, 2023 · 2 comments
Closed
9 tasks done

Allow setting options per worker group #240

himanshu-kun opened this issue Jun 28, 2023 · 2 comments
Assignees
Labels
exp/beginner Issue that requires only basic skills kind/discussion Discussion (enaging others in deciding about multiple options) kind/enhancement Enhancement, improvement, extension priority/2 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)

Comments

@himanshu-kun
Copy link

himanshu-kun commented Jun 28, 2023

/kind discussion
/priority 3

What would you like to be added:

Autoscaler now supports configuring certain flags per worker group. Some example flags are:

  • ScaleDownUtilizationThreshold
  • ScaleDownGpuUtilizationThreshold
  • ScaleDownUnneededTime
  • ScaleDownUnreadyTime
  • MaxNodeProvisionTime

This helps in more fine-grained control over a cluster and how autoscaler works for that cluster.
kubernetes#3583 -> upstream issue which introduced this
kubernetes#3789 -> PR with first implementations

With time, more and more options are being made configurable on worker group level , which makes it essential for gardener to also allow configuring them by our end-users.

Some issue which can be solved by this are:
#227 (by reducing scale-down-unneeded-time for the required nodegroup also)

The change should require changes in shoot API also , some related discussions are already ongoing gardener/gardener#8142


Changes/PRs that need to be filed to close this story

Why is this needed:
For better configurability of gardener autoscaler

@himanshu-kun himanshu-kun added the kind/enhancement Enhancement, improvement, extension label Jun 28, 2023
@gardener-robot gardener-robot added kind/discussion Discussion (enaging others in deciding about multiple options) priority/3 Priority (lower number equals higher priority) labels Jun 28, 2023
@himanshu-kun himanshu-kun added the exp/beginner Issue that requires only basic skills label Sep 12, 2023
@himanshu-kun himanshu-kun added priority/2 Priority (lower number equals higher priority) and removed priority/3 Priority (lower number equals higher priority) labels Oct 5, 2023
@aaronfern
Copy link

aaronfern commented Nov 2, 2023

A few options were debated, and we finally decided to go with an approach that is similar to the one used by amazon ASG

The following is a proposal of the solution

g/g changes:

  • Add spec.provider.workers.clusterAutoscaler to the shoot API that will expose these 5 fields to be set per node group

    example

    spec:
      provider:
        workers:
        - name: test-name
          clusterAutoscaler:
            scaleDownUtilizationThreshold: 0.5
            scaleDownGpuUtilizationThreshold: 0.5 
            scaleDownUnneededTime: 1m
            scaleDownUnreadyTime: 1m
            maxNodeProvisionTime: 1m
    

g/extension changes:

  • Update code which creates and deploys machineDeployments to add these node group specific values as annotations on the machineDeployment. If node group specific values are not provided then shoot defaults are to be added as annotations.

autoscaler changes:

The other approached we discussed were the following:

  • Overload the --nodes flag to add these node group specific values. This then involves us updating the validation for this flag, and we would then be at a disadvantage should we go down a path of using node-group-auto-discovery
  • Adding a new flag (for eg --node-group-options) that contains node group specific options , but this option has similar problems as the above, plus its making change to the core autoscaler which we should avoid.

/cc @timuthy for comments on the g/g changes

@rishabh-11
Copy link

/close as all PRs are merged

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/beginner Issue that requires only basic skills kind/discussion Discussion (enaging others in deciding about multiple options) kind/enhancement Enhancement, improvement, extension priority/2 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

No branches or pull requests

4 participants