Skip to content

Commit

Permalink
Merge pull request #113 from dasmeta/DMVP-5592-karpenter-integration
Browse files Browse the repository at this point in the history
DMVP-5592: fix alb ingress controller policy, karpenter config extend with deepmerge, karpenter storage default size to 100Gi and comments/docs
  • Loading branch information
mrdntgrn authored Dec 9, 2024
2 parents 044d2dc + 757b315 commit 8d57199
Show file tree
Hide file tree
Showing 7 changed files with 73 additions and 45 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,15 @@ Those include:
- external secrets
- metrics to cloudwatch

## Upgrading module major version:
- from 2.x.x to 3.x.x version needs some manual actions as we upgraded underlying eks module from 18.x.x to 20.x.x,
## Upgrading guide:
- from <2.19.0 to >=2.19.0 version needs some manual actions as we upgraded underlying eks module from 18.x.x to 20.x.x,
here you can find needed actions/changes docs and ready scripts which can be used:
docs:
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/UPGRADE-19.0.md
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/UPGRADE-20.0.md
params:
The node group create\_launch\_template=false and launch\_template\_name="" pair params have been replaced with use\_custom\_launch\_template=false
scripts:
```sh
# commands to move some states, run before applying the `terraform apply` for new version
terraform state mv "module.<eks-module-name>.module.eks-cluster[0].module.eks-cluster.kubernetes_config_map_v1_data.aws_auth[0]" "module.<eks-module-name>.module.eks-cluster[0].module.aws_auth_config_map.kubernetes_config_map_v1_data.aws_auth[0]"
Expand Down Expand Up @@ -199,11 +203,11 @@ worker_groups = {
}
```

# karpenter enabled
# NOTES:
# - enabling karpenter automatically disables cluster auto-scaler
# - then enabling karpenter on existing old cluster there is possibility to see cycle-dependency error, to overcome this you need at first to apply main eks module change (`terraform apply --target "module.<eks-module-name>.module.eks-cluster"`) and then rest of cluster-autoloader destroy and karpenter install onse
# - when destroying cluster which have karpenter enabled there is possibility of failure on karpenter resource removal, you need to run destruction one more time to get it complete
## karpenter enabled
### NOTES:
### - enabling karpenter automatically disables cluster auto-scaler
### - then enabling karpenter on existing old cluster there is possibility to see cycle-dependency error, to overcome this you need at first to apply main eks module change (`terraform apply --target "module.<eks-module-name>.module.eks-cluster"`) and then rest of cluster-autoloader destroy and karpenter install onse
### - when destroying cluster which have karpenter enabled there is possibility of failure on karpenter resource removal, you need to run destruction one more time to get it complete
```terraform
module "eks" {
source = "dasmeta/eks/aws"
Expand Down
18 changes: 11 additions & 7 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,15 @@
* - metrics to cloudwatch
*
*
* ## Upgrading module major version:
* - from 2.x.x to 3.x.x version needs some manual actions as we upgraded underlying eks module from 18.x.x to 20.x.x,
* ## Upgrading guide:
* - from <2.19.0 to >=2.19.0 version needs some manual actions as we upgraded underlying eks module from 18.x.x to 20.x.x,
* here you can find needed actions/changes docs and ready scripts which can be used:
* docs:
* https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/UPGRADE-19.0.md
* https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/UPGRADE-20.0.md
* params:
* The node group create_launch_template=false and launch_template_name="" pair params have been replaced with use_custom_launch_template=false
* scripts:
* ```sh
* # commands to move some states, run before applying the `terraform apply` for new version
* terraform state mv "module.<eks-module-name>.module.eks-cluster[0].module.eks-cluster.kubernetes_config_map_v1_data.aws_auth[0]" "module.<eks-module-name>.module.eks-cluster[0].module.aws_auth_config_map.kubernetes_config_map_v1_data.aws_auth[0]"
Expand Down Expand Up @@ -202,11 +206,11 @@
* }
* ```
*
* # karpenter enabled
* # NOTES:
* # - enabling karpenter automatically disables cluster auto-scaler
* # - then enabling karpenter on existing old cluster there is possibility to see cycle-dependency error, to overcome this you need at first to apply main eks module change (`terraform apply --target "module.<eks-module-name>.module.eks-cluster"`) and then rest of cluster-autoloader destroy and karpenter install onse
* # - when destroying cluster which have karpenter enabled there is possibility of failure on karpenter resource removal, you need to run destruction one more time to get it complete
* ## karpenter enabled
* ### NOTES:
* ### - enabling karpenter automatically disables cluster auto-scaler
* ### - then enabling karpenter on existing old cluster there is possibility to see cycle-dependency error, to overcome this you need at first to apply main eks module change (`terraform apply --target "module.<eks-module-name>.module.eks-cluster"`) and then rest of cluster-autoloader destroy and karpenter install onse
* ### - when destroying cluster which have karpenter enabled there is possibility of failure on karpenter resource removal, you need to run destruction one more time to get it complete
* ```terraform
* module "eks" {
* source = "dasmeta/eks/aws"
Expand Down
3 changes: 2 additions & 1 deletion modules/aws-load-balancer-controller/iam-policy.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@
"elasticloadbalancing:DescribeTargetGroupAttributes",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:AddTags"
"elasticloadbalancing:AddTags",
"elasticloadbalancing:DescribeListenerAttributes"
],
"Resource": "*"
},
Expand Down
3 changes: 2 additions & 1 deletion modules/karpenter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ module "karpenter" {

| Name | Source | Version |
|------|--------|---------|
| <a name="module_karpenter_custom_default_configs_merged"></a> [karpenter\_custom\_default\_configs\_merged](#module\_karpenter\_custom\_default\_configs\_merged) | cloudposse/config/yaml//modules/deepmerge | 1.0.2 |
| <a name="module_this"></a> [this](#module\_this) | terraform-aws-modules/eks/aws//modules/karpenter | 20.30.1 |

## Resources
Expand Down Expand Up @@ -86,7 +87,7 @@ module "karpenter" {
| <a name="input_oidc_provider_arn"></a> [oidc\_provider\_arn](#input\_oidc\_provider\_arn) | EKC oidc provider arn in format 'arn:aws:iam::<account-id>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<oidc-id>'. | `string` | n/a | yes |
| <a name="input_resource_chart_version"></a> [resource\_chart\_version](#input\_resource\_chart\_version) | The dasmeta karpenter-resources chart version | `string` | `"0.1.0"` | no |
| <a name="input_resource_configs"></a> [resource\_configs](#input\_resource\_configs) | Configurations to pass and override default ones for karpenter-resources chart. Check the helm chart available configs here: https://github.com/dasmeta/helm/tree/karpenter-resources-0.1.0/charts/karpenter-resources | `any` | `{}` | no |
| <a name="input_resource_configs_defaults"></a> [resource\_configs\_defaults](#input\_resource\_configs\_defaults) | Configurations to pass and override default ones for karpenter-resources chart. Check the helm chart available configs here: https://github.com/dasmeta/helm/tree/karpenter-resources-0.1.0/charts/karpenter-resources | <pre>object({<br> nodeClass = optional(any, {<br> amiFamily = "AL2" # Amazon Linux 2<br> detailedMonitoring = true<br> metadataOptions = {<br> httpEndpoint = "enabled"<br> httpProtocolIPv6 = "disabled"<br> httpPutResponseHopLimit = 2 # This is changed to disable IMDS access from containers not on the host network<br> httpTokens = "required"<br> }<br> })<br> nodeClassRef = optional(any, {<br> group = "karpenter.k8s.aws"<br> kind = "EC2NodeClass"<br> name = "default"<br> }),<br> requirements = optional(any, [<br> {<br> key = "karpenter.k8s.aws/instance-cpu"<br> operator = "Lt"<br> values = ["9"] # <=8 core cpu nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-memory"<br> operator = "Lt"<br> values = ["33000"] # <=32 Gb memory nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-memory"<br> operator = "Gt"<br> values = ["1000"] # >1Gb Gb memory nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-generation"<br> operator = "Gt"<br> values = ["2"] # generation of ec2 instances >2 (like t3a.medium) are more performance and effectiveness<br> },<br> {<br> key = "kubernetes.io/arch"<br> operator = "In"<br> values = ["amd64"] # amd64 linux is main platform arch we will use<br> },<br> {<br> key = "karpenter.sh/capacity-type"<br> operator = "In"<br> values = ["spot", "on-demand"] # both spot and on-demand nodes, it will look at first available spot and if no then on-demand<br> }<br> ])<br> disruption = optional(any, {<br> consolidationPolicy = "WhenEmptyOrUnderutilized"<br> consolidateAfter = "1m"<br> }),<br> limits = optional(any, {<br> cpu = 10<br> })<br> })</pre> | `{}` | no |
| <a name="input_resource_configs_defaults"></a> [resource\_configs\_defaults](#input\_resource\_configs\_defaults) | Configurations to pass and override default ones for karpenter-resources chart. Check the helm chart available configs here: https://github.com/dasmeta/helm/tree/karpenter-resources-0.1.0/charts/karpenter-resources | <pre>object({<br> nodeClass = optional(any, {<br> amiFamily = "AL2" # Amazon Linux 2<br> detailedMonitoring = true<br> metadataOptions = {<br> httpEndpoint = "enabled"<br> httpProtocolIPv6 = "disabled"<br> httpPutResponseHopLimit = 2 # This is changed to disable IMDS access from containers not on the host network<br> httpTokens = "required"<br> }<br> blockDeviceMappings = [<br> {<br> deviceName = "/dev/xvda"<br> ebs = {<br> volumeSize = "100Gi"<br> volumeType = "gp3"<br> encrypted = true<br> }<br> }<br> ]<br> })<br> nodeClassRef = optional(any, {<br> group = "karpenter.k8s.aws"<br> kind = "EC2NodeClass"<br> name = "default"<br> }),<br> requirements = optional(any, [<br> {<br> key = "karpenter.k8s.aws/instance-cpu"<br> operator = "Lt"<br> values = ["9"] # <=8 core cpu nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-memory"<br> operator = "Lt"<br> values = ["33000"] # <=32 Gb memory nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-memory"<br> operator = "Gt"<br> values = ["1000"] # >1Gb Gb memory nodes<br> },<br> {<br> key = "karpenter.k8s.aws/instance-generation"<br> operator = "Gt"<br> values = ["2"] # generation of ec2 instances >2 (like t3a.medium) are more performance and effectiveness<br> },<br> {<br> key = "kubernetes.io/arch"<br> operator = "In"<br> values = ["amd64"] # amd64 linux is main platform arch we will use<br> },<br> {<br> key = "karpenter.sh/capacity-type"<br> operator = "In"<br> values = ["spot", "on-demand"] # both spot and on-demand nodes, it will look at first available spot and if no then on-demand<br> }<br> ])<br> disruption = optional(any, {<br> consolidationPolicy = "WhenEmptyOrUnderutilized"<br> consolidateAfter = "1m"<br> }),<br> limits = optional(any, {<br> cpu = 10<br> })<br> })</pre> | `{}` | no |
| <a name="input_subnet_ids"></a> [subnet\_ids](#input\_subnet\_ids) | VPC subnet ids used for default Ec2NodeClass as subnet selector. | `list(string)` | n/a | yes |
| <a name="input_wait"></a> [wait](#input\_wait) | Whether use helm deploy with --wait flag | `bool` | `true` | no |

Expand Down
7 changes: 4 additions & 3 deletions modules/karpenter/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ locals {
amiSelectorTerms = [
{ id = data.aws_instance.ec2_from_eks_node_pool.ami }
]
detailedMonitoring = var.resource_configs_defaults.nodeClass.detailedMonitoring
metadataOptions = var.resource_configs_defaults.nodeClass.metadataOptions
detailedMonitoring = var.resource_configs_defaults.nodeClass.detailedMonitoring
metadataOptions = var.resource_configs_defaults.nodeClass.metadataOptions
blockDeviceMappings = var.resource_configs_defaults.nodeClass.blockDeviceMappings
}

nodePoolDefaultNodeClassRef = var.resource_configs_defaults.nodeClassRef
Expand All @@ -28,7 +29,7 @@ locals {
})
})
disruption = merge(var.resource_configs_defaults.disruption, try(value.disruption, {}))
limits = merge(var.resource_configs_defaults.limits, try(value.limit, {}))
limits = merge(var.resource_configs_defaults.limits, try(value.limits, {}))
}
) }
}
59 changes: 33 additions & 26 deletions modules/karpenter/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -69,32 +69,7 @@ resource "helm_release" "this" {
atomic = var.atomic
wait = var.wait

values = [jsonencode(merge({
serviceAccount = {
name = module.this.service_account
annotations = {
"eks.amazonaws.com/role-arn" = module.this.iam_role_arn
}
}
settings = {
clusterName = var.cluster_name
clusterEndpoint = var.cluster_endpoint
interruptionQueue = module.this.queue_name
featureGates = {
spotToSpotConsolidation = true
}
}
resources = {
requests = {
cpu = "100m"
memory = "256Mi"
}
limits = {
cpu = "100m"
memory = "256Mi"
}
}
}, var.configs))]
values = [jsonencode(module.karpenter_custom_default_configs_merged.merged)]
}

# allows to create karpenter crd resources such as NodeClasses, NodePools
Expand All @@ -120,3 +95,35 @@ resource "helm_release" "karpenter_nodes" {

depends_on = [helm_release.this]
}

module "karpenter_custom_default_configs_merged" {
source = "cloudposse/config/yaml//modules/deepmerge"
version = "1.0.2"

maps = [
{
serviceAccount = {
name = module.this.service_account
annotations = {
"eks.amazonaws.com/role-arn" = module.this.iam_role_arn
}
}
settings = {
clusterName = var.cluster_name
clusterEndpoint = var.cluster_endpoint
interruptionQueue = module.this.queue_name
}
resources = {
requests = {
cpu = "100m"
memory = "256Mi"
}
limits = {
cpu = "100m"
memory = "256Mi"
}
}
},
var.configs
]
}
10 changes: 10 additions & 0 deletions modules/karpenter/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,16 @@ variable "resource_configs_defaults" {
httpPutResponseHopLimit = 2 # This is changed to disable IMDS access from containers not on the host network
httpTokens = "required"
}
blockDeviceMappings = [
{
deviceName = "/dev/xvda"
ebs = {
volumeSize = "100Gi"
volumeType = "gp3"
encrypted = true
}
}
]
})
nodeClassRef = optional(any, {
group = "karpenter.k8s.aws"
Expand Down

0 comments on commit 8d57199

Please sign in to comment.