Replies: 2 comments 1 reply
-
EKS 1.27 We use these grunt works modules: For our Dev env for example this is a snippet of Node Group Configuration: One ASG per Node Group Per AZ For example our SPOT ASG/NG config would look like this
For our env in us-east-2, we would end up with three ASG/Node Group one for us-east-1a, us-east-1b, us-east-1c for each of these node groups Our Cluster Autoscaler config we set We do not set any other config on the grunt works module today, so the rest of CA config is defaults in the upstream helm chart/and/or grunt works default settings We have seen issues in our lower ends that use SPOT where we used to have m4.xlarge in the list as the first item in the instance types list. We would see issues where we would lose all our SPOT instances and then CA would try to scale up and in the ASG Activity in AWS Console you would see like 6 instances just sitting in “Pending:WAIT” for HOURS We are looking though to understand if our Cluster Autoscaler config is good and if there are any updates we should be making to it, especially around the scaling strategy. I am not sure we fully understand when to use the different types. Also there are lots of other configuration options and we are interested to know if we should update to any of those. Especially when it comes to our production env, where we do NOT use SPOT instances, and we set scaling_strategy to “least-waste” which is the default. |
Beta Was this translation helpful? Give feedback.
-
Hi, Scaling EKS is a challenging task. It is generally best to enable logging and observability to investigate any scheduling/scaling hiccups. For Cluster Autoscaler, you can make sure to have the verbosity level set to 4 You can also consider setting up alarms on Instances stuck in the Pending state or pods that failed to be scheduled. There are multiple places for configuration you can look at: Worker Pools
Cluster Autoscaler
More granularity (Karpenter)The Karpenter offers better configurations for more granular control over the scaling process. It can handle interruption events and provide new capacity quickly. It also integrates with AWS better. But it is a newer project compared to Cluster Autoscaler. |
Beta Was this translation helpful? Give feedback.
-
What are some recommended configurations for the EKS/Kubernetes Cluster Autoscaler? Some general recommendations around the following will be helpful:
Tracked in ticket #110830
Beta Was this translation helpful? Give feedback.
All reactions