Adding new components that run in the garden, seed, or shoot cluster is theoretically quite simple - we just need a Deployment
(or other similar workload resource), the respective container image, and maybe a bit of configuration.
In practice, however, there are a couple of things to keep in mind in order to make the deployment production-ready.
This document provides a checklist for them that you can walk through.
-
Avoid usage of Helm charts (example)
Nowadays, we use Golang components instead of Helm charts for deploying components to a cluster. Please find a typical structure of such components in the provided metrics_server.go file (configuration values are typically managed in a
Values
structure). There are a few exceptions (e.g., Istio) still using charts, however the default should be using a Golang-based implementation. For the exceptional cases, use Golang's embed package to embed the Helm chart directory (example 1, example 2). -
Choose the proper deployment way (example 1 (direct application w/ client), example 2 (using
ManagedResource
), example 3 (mixed scenario))For historic reasons, resources related to shoot control plane components are applied directly with the client. All other resources (seed or shoot system components) are deployed via
gardener-resource-manager
's Resource controller (ManagedResource
s) since it performs health checks out-of-the-box and has a lot of other features (see its documentation for more information). Components that can run as both seed system component or shoot control plane component (e.g., VPA orkube-state-metrics
) can make use of these utility functions. -
Use unique
ConfigMap
s/Secret
s (example 1, example 2)Unique
ConfigMap
s/Secret
s are immutable for modification and have a unique name. This has a couple of benefits, e.g. thekubelet
doesn't watch these resources, and it is always clear which resource contains which data since it cannot be changed. As a consequence, unique/immutableConfigMap
s/Secret
are superior to checksum annotations on the pod templates. Stale/unusedConfigMap
s/Secret
s are garbage-collected bygardener-resource-manager
's GarbageCollector. There are utility functions (see examples above) for using uniqueConfigMap
s/Secret
s in Golang components. It is essential to inject the annotations into the workload resource to make the garbage-collection work.
Note that someConfigMap
s/Secret
s should not be unique (e.g., those containing monitoring or logging configuration). The reason is that the old revision stays in the cluster even if unused until the garbage-collector acts. During this time, they would be wrongly aggregated to the full configuration. -
Manage certificates/secrets via secrets manager (example)
You should use the secrets manager for the management of any kind of credentials. This makes sure that credentials rotation works out-of-the-box without you requiring to think about it. Generally, do not use client certificates (see the Security section).
-
Consider hibernation when calculating replica count (example)
Shoot clusters can be hibernated meaning that all control plane components in the shoot namespace in the seed cluster are scaled down to zero and all worker nodes are terminated. If your component runs in the seed cluster then you have to consider this case and provide the proper replica count. There is a utility function available (see example).
-
Ensure task dependencies are as precise as possible in shoot flows (example 1, example 2)
Only define the minimum of needed dependency tasks in the shoot reconciliation/deletion flows.
-
Handle shoot system components
Shoot system components deployed by
gardener-resource-manager
are labelled withresource.gardener.cloud/managed-by: gardener
. This makes Gardener adding required label selectors and tolerations so that non-DaemonSet
managedPod
s will exclusively run on selected nodes (for more information, see System Components Webhook).DaemonSet
s on the other hand, should generally tolerate anyNoSchedule
orNoExecute
taints so that they can run on anyNode
, regardless of user added taints.
-
Do not hard-code container image references (example 1, example 2, example 3)
We define all image references centrally in the
imagevector/containers.yaml
file. Hence, the image references must not be hard-coded in the pod template spec but read from this so-called image vector instead. -
Do not use container images from registries that don't support IPv6 (example: image vector, prow configuration)
Registries such as ECR, GHCR (
ghcr.io
), MCR (mcr.microsoft.com
) don't support pulling images over IPv6.Check if the upstream image is being also maintained in a registry that support IPv6 natively such as Artifact Registry, Quay (
quay.io
). If there is such image, use the image from registry with IPv6 support.If the image is not available in a registry with IPv6 then copy the image to the gardener GCR. There is a prow job copying images that are needed in gardener components from a source registry to the gardener GCR under the prefix
europe-docker.pkg.dev/gardener-project/releases/3rd/
(see the documentation or gardener/ci-infra#619).If you want to use a new image from a registry without IPv6 support or upgrade an already used image to a newer tag, please open a PR to the ci-infra repository that modifies the job's list of images to copy:
images.yaml
. -
Do not use container images from Docker Hub (example: image vector, prow configuration)
There is a strict rate-limit that applies to the Docker Hub registry. As described in 2., use another registry (if possible) or copy the image to the gardener GCR.
-
Do not use Shoot container images that are not multi-arch
Gardener supports Shoot clusters with both
amd64
andarm64
based worker Nodes.amd64
container images cannot run onarm64
worker Nodes and vice-versa.
-
Use a dedicated
ServiceAccount
and disable auto-mount (example)Components that need to talk to the API server of their runtime cluster must always use a dedicated
ServiceAccount
(do not usedefault
), withautomountServiceAccountToken
set tofalse
. This makesgardener-resource-manager
's TokenInvalidator invalidate the static token secret and itsProjectedTokenMount
webhook inject a projected token automatically. -
Use shoot access tokens instead of a client certificates (example)
For components that need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) the
gardener-resource-manager
's TokenRequestor should be used to manage a so-called "shoot access token". -
Define RBAC roles with minimal privileges (example)
The component's
ServiceAccount
(if it exists) should have as little privileges as possible. Consequently, please define proper RBAC roles for it. This might include a combination ofClusterRole
s andRole
s. Please do not provide elevated privileges due to laziness (e.g., because there is already aClusterRole
that can be extended vs. creating aRole
only when access to a single namespace is needed). -
Use
NetworkPolicy
s to restrict network trafficYou should restrict both ingress and egress traffic to/from your component as much as possible to ensure that it only gets access to/from other components if really needed. Gardener provides a few default policies for typical usage scenarios. For more information, see
NetworkPolicy
s In Garden, Seed, Shoot Clusters. -
Do not run containers in privileged mode (example, example 2)
Avoid running containers with
privileged=true
. Instead, define the needed Linux capabilities. -
Do not run containers as root (example)
Avoid running containers as root. Usually, components such as Kubernetes controllers and admission webhook servers don't need root user capabilities to do their jobs.
The problem with running as root, starts with how the container is first built. Unless a non-privileged user is configured in the
Dockerfile
, container build systems by default set up the container with the root user. Add a non-privileged user to yourDockerfile
or use a base image with a non-root user (for example thenonroot
images from distroless such asgcr.io/distroless/static-debian12:nonroot
).If the image is an upstream one, then consider configuring a securityContext for the container/Pod with a non-privileged user. For more information, see Configure a Security Context for a Pod or Container.
-
Choose the proper Seccomp profile (example 1, example 2)
For components deployed in the Seed cluster, the Seccomp profile will be defaulted to
RuntimeDefault
bygardener-resource-manager
's SeccompProfile webhook which works well for the majority of components. However, in some special cases you might need to overwrite it.The
gardener-resource-manager
's SeccompProfile webhook is not enabled for a Shoot cluster. For components deployed in the Shoot cluster, it is required [*] to explicitly specify the Seccomp profile.[*] It is required because if a component deployed in the Shoot cluster does not specify a Seccomp profile and cannot run with the
RuntimeDefault
Seccomp profile, then enabling the.spec.kubernetes.kubelet.seccompDefault
field in the Shoot spec would break the corresponding component.
-
Specify the component type label for high availability (example)
To support high-availability deployments,
gardener-resource-manager
s HighAvailabilityConfig webhook injects the proper specification like replica or topology spread constraints. You only need to specify the type label. For more information, see High Availability Of Deployed Components. -
Define a
PodDisruptionBudget
(example)Closely related to high availability but also to stability in general: The definition of a
PodDisruptionBudget
withmaxUnavailable=1
should be provided by default. -
Choose the right
PriorityClass
(example)Each cluster runs many components with different priorities. Gardener provides a set of default
PriorityClass
es. For more information, see Priority Classes. -
Consider defining liveness and readiness probes (example)
To ensure smooth rolling update behaviour, consider the definition of liveness and/or readiness probes.
-
Mark node-critical components (example)
To ensure user workload pods are only scheduled to
Nodes
where all node-critical components are ready, these components need to tolerate thenode.gardener.cloud/critical-components-not-ready
taint (NoSchedule
effect). Also, suchDaemonSets
and the includedPodTemplates
need to be labelled withnode.gardener.cloud/critical-component=true
. For more information, see Readiness of Shoot Worker Nodes. -
Consider making a
Service
topology-aware (example)To reduce costs and to improve the network traffic latency in multi-zone Seed clusters, consider making a
Service
topology-aware, if applicable. In short, when aService
is topology-aware, Kubernetes routes network traffic to theEndpoint
s (Pod
s) which are located in the same zone where the traffic originated from. In this way, the cross availability zone traffic is avoided. See Topology-Aware Traffic Routing. -
Enable leader election unconditionally for controllers (example 1, example 2, example 3)
Enable leader election unconditionally for controllers independently from the number of replicas or from the high availability configurations. Having leader election enabled even for a single replica Deployment prevents having two Pods active at the same time. Otherwise, there are some corner cases that can result in two active Pods - Deployment rolling update or kubelet stops running on a Node and is not able to terminate the old replica while kube-controller-manager creates a new replica to match the Deployment's desired replicas count.
-
Provide resource requirements (example)
All components should define reasonable (initial) CPU and memory
requests
and avoid limits (especially CPU limits) unless you know the healthy range for your component (almost impossible with most components today), but no more than the node allocatable remainder (after daemonset pods) of the largest eligible machine type. Scheduling only takesrequests
into account! -
Define a
VerticalPodAutoscaler
(example)We typically (need to) perform vertical auto-scaling for containers that have a significant usage (>50m/100M) and a significant usage spread over time (>2x) by defining a
VerticalPodAutoscaler
withupdatePolicy.updateMode
Auto
,containerPolicies[].controlledValues
RequestsOnly
, reasonableminAllowed
configuration and nomaxAllowed
configuration (will be taken care of in Gardener environments for you/capped at the largest eligible machine type). -
Define a
HorizontalPodAutoscaler
if needed (example)If your component is capable of scaling horizontally, you should consider defining a
HorizontalPodAutoscaler
.
Note
For more information and concrete configuration hints, please see our best practices guide for pod auto scaling and especially the summary and recommendations sections.
-
Provide monitoring scrape config and alerting rules (example 1, example 2)
Components should provide scrape configuration and alerting rules for Prometheus/Alertmanager if appropriate. This should be done inside a dedicated
monitoring.go
file. Extensions should follow the guidelines described in Extensions Monitoring Integration. -
Provide logging parsers and filters (example 1, example 2)
Components should provide parsers and filters for fluent-bit, if appropriate. This should be done inside a dedicated
logging.go
file. Extensions should follow the guidelines described in Fluent-bit log parsers and filters. -
Set the
revisionHistoryLimit
to2
forDeployment
s (example)In order to allow easy inspection of two
ReplicaSet
s to quickly find the changes that lead to a rolling update, the revision history limit should be set to2
. -
Define health checks (example 1)
gardener-operators
's andgardenlet
's care controllers regularly check the health status of components relevant to the respective cluster (garden/seed/shoot). For shoot control plane components, you need to enhance the lists of components to make sure your component is checked, see example above. For components deployed viaManagedResource
, please consult the respective care controller documentation for more information (garden, seed, shoot). -
Configure automatic restarts in shoot maintenance time window (example 1, example 2)
Gardener offers to restart components during the maintenance time window. For more information, see Restart Control Plane Controllers and Restart Some Core Addons. You can consider adding the needed label to your control plane component to get this automatic restart (probably not needed for most components).