Skip to content

Latest commit

 

History

History
201 lines (120 loc) · 25.5 KB

component-checklist.md

File metadata and controls

201 lines (120 loc) · 25.5 KB

Checklist For Adding New Components

Adding new components that run in the garden, seed, or shoot cluster is theoretically quite simple - we just need a Deployment (or other similar workload resource), the respective container image, and maybe a bit of configuration. In practice, however, there are a couple of things to keep in mind in order to make the deployment production-ready. This document provides a checklist for them that you can walk through.

General

  1. Avoid usage of Helm charts (example)

    Nowadays, we use Golang components instead of Helm charts for deploying components to a cluster. Please find a typical structure of such components in the provided metrics_server.go file (configuration values are typically managed in a Values structure). There are a few exceptions (e.g., Istio) still using charts, however the default should be using a Golang-based implementation. For the exceptional cases, use Golang's embed package to embed the Helm chart directory (example 1, example 2).

  2. Choose the proper deployment way (example 1 (direct application w/ client), example 2 (using ManagedResource), example 3 (mixed scenario))

    For historic reasons, resources related to shoot control plane components are applied directly with the client. All other resources (seed or shoot system components) are deployed via gardener-resource-manager's Resource controller (ManagedResources) since it performs health checks out-of-the-box and has a lot of other features (see its documentation for more information). Components that can run as both seed system component or shoot control plane component (e.g., VPA or kube-state-metrics) can make use of these utility functions.

  3. Use unique ConfigMaps/Secrets (example 1, example 2)

    Unique ConfigMaps/Secrets are immutable for modification and have a unique name. This has a couple of benefits, e.g. the kubelet doesn't watch these resources, and it is always clear which resource contains which data since it cannot be changed. As a consequence, unique/immutable ConfigMaps/Secret are superior to checksum annotations on the pod templates. Stale/unused ConfigMaps/Secrets are garbage-collected by gardener-resource-manager's GarbageCollector. There are utility functions (see examples above) for using unique ConfigMaps/Secrets in Golang components. It is essential to inject the annotations into the workload resource to make the garbage-collection work.
    Note that some ConfigMaps/Secrets should not be unique (e.g., those containing monitoring or logging configuration). The reason is that the old revision stays in the cluster even if unused until the garbage-collector acts. During this time, they would be wrongly aggregated to the full configuration.

  4. Manage certificates/secrets via secrets manager (example)

    You should use the secrets manager for the management of any kind of credentials. This makes sure that credentials rotation works out-of-the-box without you requiring to think about it. Generally, do not use client certificates (see the Security section).

  5. Consider hibernation when calculating replica count (example)

    Shoot clusters can be hibernated meaning that all control plane components in the shoot namespace in the seed cluster are scaled down to zero and all worker nodes are terminated. If your component runs in the seed cluster then you have to consider this case and provide the proper replica count. There is a utility function available (see example).

  6. Ensure task dependencies are as precise as possible in shoot flows (example 1, example 2)

    Only define the minimum of needed dependency tasks in the shoot reconciliation/deletion flows.

  7. Handle shoot system components

    Shoot system components deployed by gardener-resource-manager are labelled with resource.gardener.cloud/managed-by: gardener. This makes Gardener adding required label selectors and tolerations so that non-DaemonSet managed Pods will exclusively run on selected nodes (for more information, see System Components Webhook). DaemonSets on the other hand, should generally tolerate any NoSchedule or NoExecute taints so that they can run on any Node, regardless of user added taints.

Images

  1. Do not hard-code container image references (example 1, example 2, example 3)

    We define all image references centrally in the imagevector/containers.yaml file. Hence, the image references must not be hard-coded in the pod template spec but read from this so-called image vector instead.

  2. Do not use container images from registries that don't support IPv6 (example: image vector, prow configuration)

    Registries such as ECR, GHCR (ghcr.io), MCR (mcr.microsoft.com) don't support pulling images over IPv6.

    Check if the upstream image is being also maintained in a registry that support IPv6 natively such as Artifact Registry, Quay (quay.io). If there is such image, use the image from registry with IPv6 support.

    If the image is not available in a registry with IPv6 then copy the image to the gardener GCR. There is a prow job copying images that are needed in gardener components from a source registry to the gardener GCR under the prefix europe-docker.pkg.dev/gardener-project/releases/3rd/ (see the documentation or gardener/ci-infra#619).

    If you want to use a new image from a registry without IPv6 support or upgrade an already used image to a newer tag, please open a PR to the ci-infra repository that modifies the job's list of images to copy: images.yaml.

  3. Do not use container images from Docker Hub (example: image vector, prow configuration)

    There is a strict rate-limit that applies to the Docker Hub registry. As described in 2., use another registry (if possible) or copy the image to the gardener GCR.

  4. Do not use Shoot container images that are not multi-arch

    Gardener supports Shoot clusters with both amd64 and arm64 based worker Nodes. amd64 container images cannot run on arm64 worker Nodes and vice-versa.

Security

  1. Use a dedicated ServiceAccount and disable auto-mount (example)

    Components that need to talk to the API server of their runtime cluster must always use a dedicated ServiceAccount (do not use default), with automountServiceAccountToken set to false. This makes gardener-resource-manager's TokenInvalidator invalidate the static token secret and its ProjectedTokenMount webhook inject a projected token automatically.

  2. Use shoot access tokens instead of a client certificates (example)

    For components that need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) the gardener-resource-manager's TokenRequestor should be used to manage a so-called "shoot access token".

  3. Define RBAC roles with minimal privileges (example)

    The component's ServiceAccount (if it exists) should have as little privileges as possible. Consequently, please define proper RBAC roles for it. This might include a combination of ClusterRoles and Roles. Please do not provide elevated privileges due to laziness (e.g., because there is already a ClusterRole that can be extended vs. creating a Role only when access to a single namespace is needed).

  4. Use NetworkPolicys to restrict network traffic

    You should restrict both ingress and egress traffic to/from your component as much as possible to ensure that it only gets access to/from other components if really needed. Gardener provides a few default policies for typical usage scenarios. For more information, see NetworkPolicys In Garden, Seed, Shoot Clusters.

  5. Do not run containers in privileged mode (example, example 2)

    Avoid running containers with privileged=true. Instead, define the needed Linux capabilities.

  6. Do not run containers as root (example)

    Avoid running containers as root. Usually, components such as Kubernetes controllers and admission webhook servers don't need root user capabilities to do their jobs.

    The problem with running as root, starts with how the container is first built. Unless a non-privileged user is configured in the Dockerfile, container build systems by default set up the container with the root user. Add a non-privileged user to your Dockerfile or use a base image with a non-root user (for example the nonroot images from distroless such as gcr.io/distroless/static-debian12:nonroot).

    If the image is an upstream one, then consider configuring a securityContext for the container/Pod with a non-privileged user. For more information, see Configure a Security Context for a Pod or Container.

  7. Choose the proper Seccomp profile (example 1, example 2)

    For components deployed in the Seed cluster, the Seccomp profile will be defaulted to RuntimeDefault by gardener-resource-manager's SeccompProfile webhook which works well for the majority of components. However, in some special cases you might need to overwrite it.

    The gardener-resource-manager's SeccompProfile webhook is not enabled for a Shoot cluster. For components deployed in the Shoot cluster, it is required [*] to explicitly specify the Seccomp profile.

    [*] It is required because if a component deployed in the Shoot cluster does not specify a Seccomp profile and cannot run with the RuntimeDefault Seccomp profile, then enabling the .spec.kubernetes.kubelet.seccompDefault field in the Shoot spec would break the corresponding component.

High Availability / Stability

  1. Specify the component type label for high availability (example)

    To support high-availability deployments, gardener-resource-managers HighAvailabilityConfig webhook injects the proper specification like replica or topology spread constraints. You only need to specify the type label. For more information, see High Availability Of Deployed Components.

  2. Define a PodDisruptionBudget (example)

    Closely related to high availability but also to stability in general: The definition of a PodDisruptionBudget with maxUnavailable=1 should be provided by default.

  3. Choose the right PriorityClass (example)

    Each cluster runs many components with different priorities. Gardener provides a set of default PriorityClasses. For more information, see Priority Classes.

  4. Consider defining liveness and readiness probes (example)

    To ensure smooth rolling update behaviour, consider the definition of liveness and/or readiness probes.

  5. Mark node-critical components (example)

    To ensure user workload pods are only scheduled to Nodes where all node-critical components are ready, these components need to tolerate the node.gardener.cloud/critical-components-not-ready taint (NoSchedule effect). Also, such DaemonSets and the included PodTemplates need to be labelled with node.gardener.cloud/critical-component=true. For more information, see Readiness of Shoot Worker Nodes.

  6. Consider making a Service topology-aware (example)

    To reduce costs and to improve the network traffic latency in multi-zone Seed clusters, consider making a Service topology-aware, if applicable. In short, when a Service is topology-aware, Kubernetes routes network traffic to the Endpoints (Pods) which are located in the same zone where the traffic originated from. In this way, the cross availability zone traffic is avoided. See Topology-Aware Traffic Routing.

  7. Enable leader election unconditionally for controllers (example 1, example 2, example 3)

    Enable leader election unconditionally for controllers independently from the number of replicas or from the high availability configurations. Having leader election enabled even for a single replica Deployment prevents having two Pods active at the same time. Otherwise, there are some corner cases that can result in two active Pods - Deployment rolling update or kubelet stops running on a Node and is not able to terminate the old replica while kube-controller-manager creates a new replica to match the Deployment's desired replicas count.

Scalability

  1. Provide resource requirements (example)

    All components should define reasonable (initial) CPU and memory requests and avoid limits (especially CPU limits) unless you know the healthy range for your component (almost impossible with most components today), but no more than the node allocatable remainder (after daemonset pods) of the largest eligible machine type. Scheduling only takes requests into account!

  2. Define a VerticalPodAutoscaler (example)

    We typically (need to) perform vertical auto-scaling for containers that have a significant usage (>50m/100M) and a significant usage spread over time (>2x) by defining a VerticalPodAutoscaler with updatePolicy.updateMode Auto, containerPolicies[].controlledValues RequestsOnly, reasonable minAllowed configuration and no maxAllowed configuration (will be taken care of in Gardener environments for you/capped at the largest eligible machine type).

  3. Define a HorizontalPodAutoscaler if needed (example)

    If your component is capable of scaling horizontally, you should consider defining a HorizontalPodAutoscaler.

Note

For more information and concrete configuration hints, please see our best practices guide for pod auto scaling and especially the summary and recommendations sections.

Observability / Operations Productivity

  1. Provide monitoring scrape config and alerting rules (example 1, example 2)

    Components should provide scrape configuration and alerting rules for Prometheus/Alertmanager if appropriate. This should be done inside a dedicated monitoring.go file. Extensions should follow the guidelines described in Extensions Monitoring Integration.

  2. Provide logging parsers and filters (example 1, example 2)

    Components should provide parsers and filters for fluent-bit, if appropriate. This should be done inside a dedicated logging.go file. Extensions should follow the guidelines described in Fluent-bit log parsers and filters.

  3. Set the revisionHistoryLimit to 2 for Deployments (example)

    In order to allow easy inspection of two ReplicaSets to quickly find the changes that lead to a rolling update, the revision history limit should be set to 2.

  4. Define health checks (example 1)

    gardener-operators's and gardenlet's care controllers regularly check the health status of components relevant to the respective cluster (garden/seed/shoot). For shoot control plane components, you need to enhance the lists of components to make sure your component is checked, see example above. For components deployed via ManagedResource, please consult the respective care controller documentation for more information (garden, seed, shoot).

  5. Configure automatic restarts in shoot maintenance time window (example 1, example 2)

    Gardener offers to restart components during the maintenance time window. For more information, see Restart Control Plane Controllers and Restart Some Core Addons. You can consider adding the needed label to your control plane component to get this automatic restart (probably not needed for most components).