Skip to content

Commit

Permalink
Add tolerations for arm64 workloads on GKE
Browse files Browse the repository at this point in the history
By default, GKE schedules workloads only to x86-based nodes
by by placing a taint (kubernetes.io/arch=arm64:NoSchedule)
on all Arm nodes [1]. This taint prevents Calico components
from being scheduled on ARM nodes. This change adds
toleration to the workload specification as suggested in [2].

[1] https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#overview
[2] https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#multi-arch-schedule-any-arch
  • Loading branch information
hjiawei committed Oct 16, 2024
1 parent be04bc9 commit 629bb78
Show file tree
Hide file tree
Showing 22 changed files with 257 additions and 35 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubernetes-sigs/network-policy-api/pull/30
policy.networking.k8s.io/bundle-version: v0.1.1
policy.networking.k8s.io/channel: standard
policy.networking.k8s.io/channel: experimental
creationTimestamp: null
name: adminnetworkpolicies.policy.networking.k8s.io
spec:
Expand Down Expand Up @@ -115,6 +115,16 @@ spec:
maxProperties: 1
minProperties: 1
properties:
namedPort:
description: |-
NamedPort selects a port on a pod(s) based on name.
Support: Extended
<network-policy-api:experimental>
type: string
portNumber:
description: |-
Port selects a port on a pod(s) based on number.
Expand Down Expand Up @@ -258,6 +268,97 @@ spec:
type: object
type: object
x-kubernetes-map-type: atomic
networks:
description: |-
Networks defines a way to select peers via CIDR blocks.
This is intended for representing entities that live outside the cluster,
which can't be selected by pods, namespaces and nodes peers, but note
that cluster-internal traffic will be checked against the rule as
well. So if you Allow or Deny traffic to `"0.0.0.0/0"`, that will allow
or deny all IPv4 pod-to-pod traffic as well. If you don't want that,
add a rule that Passes all pod traffic before the Networks rule.
Each item in Networks should be provided in the CIDR format and should be
IPv4 or IPv6, for example "10.0.0.0/8" or "fd00::/8".
Networks can have upto 25 CIDRs specified.
Support: Extended
<network-policy-api:experimental>
items:
description: |-
CIDR is an IP address range in CIDR notation (for example, "10.0.0.0/8" or "fd00::/8").
This string must be validated by implementations using net.ParseCIDR
TODO: Introduce CEL CIDR validation regex isCIDR() in Kube 1.31 when it is available.
maxLength: 43
type: string
x-kubernetes-validations:
- message: CIDR must be either an IPv4 or IPv6 address.
IPv4 address embedded in IPv6 addresses are not
supported
rule: self.contains(':') != self.contains('.')
maxItems: 25
minItems: 1
type: array
x-kubernetes-list-type: set
nodes:
description: |-
Nodes defines a way to select a set of nodes in
the cluster. This field follows standard label selector
semantics; if present but empty, it selects all Nodes.
Support: Extended
<network-policy-api:experimental>
properties:
matchExpressions:
description: matchExpressions is a list of label selector
requirements. The requirements are ANDed.
items:
description: |-
A label selector requirement is a selector that contains values, a key, and an operator that
relates the key and values.
properties:
key:
description: key is the label key that the selector
applies to.
type: string
operator:
description: |-
operator represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists and DoesNotExist.
type: string
values:
description: |-
values is an array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. This array is replaced during a strategic
merge patch.
items:
type: string
type: array
required:
- key
- operator
type: object
type: array
matchLabels:
additionalProperties:
type: string
description: |-
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
map is equivalent to an element of matchExpressions, whose key field is "key", the
operator is "In", and the values array contains only "value". The requirements are ANDed.
type: object
type: object
x-kubernetes-map-type: atomic
pods:
description: |-
Pods defines a way to select a set of pods in
Expand Down Expand Up @@ -373,6 +474,11 @@ spec:
- action
- to
type: object
x-kubernetes-validations:
- message: networks/nodes peer cannot be set with namedPorts since
there are no namedPorts for networks/nodes
rule: '!(self.to.exists(peer, has(peer.networks) || has(peer.nodes))
&& has(self.ports) && self.ports.exists(port, has(port.namedPort)))'
maxItems: 100
type: array
ingress:
Expand Down Expand Up @@ -617,6 +723,16 @@ spec:
maxProperties: 1
minProperties: 1
properties:
namedPort:
description: |-
NamedPort selects a port on a pod(s) based on name.
Support: Extended
<network-policy-api:experimental>
type: string
portNumber:
description: |-
Port selects a port on a pod(s) based on number.
Expand Down
6 changes: 5 additions & 1 deletion pkg/render/apiserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -1316,7 +1316,11 @@ func (c *apiServerComponent) tolerations() []corev1.Toleration {
if c.hostNetwork() {
return rmeta.TolerateAll
}
return append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...)
tolerations := append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...)
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}
return tolerations
}

// networkPolicy returns a NP to allow traffic to the API server. This prevents it from
Expand Down
9 changes: 9 additions & 0 deletions pkg/render/common/meta/meta.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ var (

TolerateCriticalAddonsAndControlPlane = append(TolerateControlPlane, TolerateCriticalAddonsOnly)

// TolerateGKEArm64NoSchedule allows pods to be scheduled on GKE Arm64 nodes.
// See https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#multi-arch-schedule-any-arch
TolerateGKEArm64NoSchedule = corev1.Toleration{
Key: "kubernetes.io/arch",
Operator: corev1.TolerationOpEqual,
Value: "arm64",
Effect: corev1.TaintEffectNoSchedule,
}

// TolerateAll returns tolerations to tolerate all taints. When used, it is not necessary
// to include the user's custom tolerations because we already tolerate everything.
TolerateAll = []corev1.Toleration{
Expand Down
7 changes: 6 additions & 1 deletion pkg/render/compliance.go
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,11 @@ func (c *complianceComponent) complianceReporterPodTemplate() *corev1.PodTemplat
initContainers = append(initContainers, c.cfg.ReporterKeyPair.InitContainer(c.cfg.Namespace))
}

tolerations := append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...)
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}

podtemplate := &corev1.PodTemplate{
TypeMeta: metav1.TypeMeta{Kind: "PodTemplate", APIVersion: "v1"},
ObjectMeta: metav1.ObjectMeta{
Expand All @@ -648,7 +653,7 @@ func (c *complianceComponent) complianceReporterPodTemplate() *corev1.PodTemplat
},
Spec: corev1.PodSpec{
ServiceAccountName: ComplianceReporterServiceAccount,
Tolerations: append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...),
Tolerations: tolerations,
NodeSelector: c.cfg.Installation.ControlPlaneNodeSelector,
ImagePullSecrets: secret.GetReferenceList(c.cfg.PullSecrets),
InitContainers: initContainers,
Expand Down
7 changes: 6 additions & 1 deletion pkg/render/dex.go
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,11 @@ func (c *dexComponent) deployment() client.Object {
mounts = append(mounts, c.cfg.TLSKeyPair.VolumeMount(c.SupportedOSType()))
mounts = append(mounts, c.cfg.TrustedBundle.VolumeMounts(c.SupportedOSType())...)

tolerations := append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...)
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}

d := &appsv1.Deployment{
TypeMeta: metav1.TypeMeta{Kind: "Deployment", APIVersion: "apps/v1"},
ObjectMeta: metav1.ObjectMeta{
Expand All @@ -246,7 +251,7 @@ func (c *dexComponent) deployment() client.Object {
Spec: corev1.PodSpec{
NodeSelector: c.cfg.Installation.ControlPlaneNodeSelector,
ServiceAccountName: DexObjectName,
Tolerations: append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateControlPlane...),
Tolerations: tolerations,
ImagePullSecrets: secret.GetReferenceList(c.cfg.PullSecrets),
InitContainers: initContainers,
Containers: []corev1.Container{
Expand Down
5 changes: 5 additions & 0 deletions pkg/render/egressgateway/egressgateway.go
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,10 @@ func (c *component) deploymentPodTemplate() *corev1.PodTemplateSpec {
for _, x := range c.config.PullSecrets {
ps = append(ps, corev1.LocalObjectReference{Name: x.Name})
}
tolerations := []corev1.Toleration{}
if c.config.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}
return &corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Annotations: c.egwBuildAnnotations(),
Expand All @@ -162,6 +166,7 @@ func (c *component) deploymentPodTemplate() *corev1.PodTemplateSpec {
InitContainers: []corev1.Container{*c.egwInitContainer()},
Containers: []corev1.Container{*c.egwContainer()},
ServiceAccountName: c.config.EgressGW.Name,
Tolerations: tolerations,
Volumes: []corev1.Volume{*c.egwVolume()},
},
}
Expand Down
7 changes: 6 additions & 1 deletion pkg/render/fluentd.go
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,11 @@ func (c *fluentdComponent) eksLogForwarderDeployment() *appsv1.Deployment {

var eksLogForwarderReplicas int32 = 1

tolerations := c.cfg.Installation.ControlPlaneTolerations
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}

d := &appsv1.Deployment{
TypeMeta: metav1.TypeMeta{Kind: "Deployment", APIVersion: "apps/v1"},
ObjectMeta: metav1.ObjectMeta{
Expand Down Expand Up @@ -1044,7 +1049,7 @@ func (c *fluentdComponent) eksLogForwarderDeployment() *appsv1.Deployment {
Annotations: annots,
},
Spec: corev1.PodSpec{
Tolerations: c.cfg.Installation.ControlPlaneTolerations,
Tolerations: tolerations,
ServiceAccountName: EKSLogForwarderName,
ImagePullSecrets: secret.GetReferenceList(c.cfg.PullSecrets),
InitContainers: []corev1.Container{{
Expand Down
7 changes: 6 additions & 1 deletion pkg/render/guardian.go
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,11 @@ func (c *GuardianComponent) clusterRoleBinding() *rbacv1.ClusterRoleBinding {
func (c *GuardianComponent) deployment() *appsv1.Deployment {
var replicas int32 = 1

tolerations := append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateCriticalAddonsAndControlPlane...)
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}

d := &appsv1.Deployment{
TypeMeta: metav1.TypeMeta{Kind: "Deployment", APIVersion: "apps/v1"},
ObjectMeta: metav1.ObjectMeta{
Expand All @@ -276,7 +281,7 @@ func (c *GuardianComponent) deployment() *appsv1.Deployment {
Spec: corev1.PodSpec{
NodeSelector: c.cfg.Installation.ControlPlaneNodeSelector,
ServiceAccountName: GuardianServiceAccountName,
Tolerations: append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateCriticalAddonsAndControlPlane...),
Tolerations: tolerations,
ImagePullSecrets: secret.GetReferenceList(c.cfg.PullSecrets),
Containers: c.container(),
Volumes: c.volumes(),
Expand Down
7 changes: 6 additions & 1 deletion pkg/render/intrusion_detection.go
Original file line number Diff line number Diff line change
Expand Up @@ -601,14 +601,19 @@ func (c *intrusionDetectionComponent) deploymentPodTemplate() *corev1.PodTemplat
containers = append(containers, c.webhooksControllerContainer())
}

tolerations := c.cfg.Installation.ControlPlaneTolerations
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}

return &corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Name: IntrusionDetectionName,
Namespace: c.cfg.Namespace,
Annotations: c.intrusionDetectionAnnotations(),
},
Spec: corev1.PodSpec{
Tolerations: c.cfg.Installation.ControlPlaneTolerations,
Tolerations: tolerations,
NodeSelector: c.cfg.Installation.ControlPlaneNodeSelector,
ServiceAccountName: IntrusionDetectionName,
ImagePullSecrets: ps,
Expand Down
6 changes: 5 additions & 1 deletion pkg/render/kubecontrollers/kube-controllers.go
Original file line number Diff line number Diff line change
Expand Up @@ -626,9 +626,13 @@ func (c *kubeControllersComponent) controllersDeployment() *appsv1.Deployment {
if c.cfg.MetricsServerTLS != nil && c.cfg.MetricsServerTLS.UseCertificateManagement() {
initContainers = append(initContainers, c.cfg.MetricsServerTLS.InitContainer(c.cfg.Namespace))
}
tolerations := append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateCriticalAddonsAndControlPlane...)
if c.cfg.Installation.KubernetesProvider.IsGKE() {
tolerations = append(tolerations, rmeta.TolerateGKEArm64NoSchedule)
}
podSpec := corev1.PodSpec{
NodeSelector: c.cfg.Installation.ControlPlaneNodeSelector,
Tolerations: append(c.cfg.Installation.ControlPlaneTolerations, rmeta.TolerateCriticalAddonsAndControlPlane...),
Tolerations: tolerations,
ImagePullSecrets: c.cfg.Installation.ImagePullSecrets,
ServiceAccountName: c.kubeControllerServiceAccountName,
InitContainers: initContainers,
Expand Down
Loading

0 comments on commit 629bb78

Please sign in to comment.