Skip to content

Commit

Permalink
Fix checks
Browse files Browse the repository at this point in the history
  • Loading branch information
studenym-hpe committed Jan 2, 2025
1 parent fcaec33 commit 9fdb1bb
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 55 deletions.
4 changes: 4 additions & 0 deletions .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ victoria
5.15.itb
64-port
7060CX2-32S
8.x
8.x.x
802.1Q
802.1p
802.1s
Expand All @@ -95,6 +97,8 @@ victoria
802.3cd/bs
8325-23C
8325-48Y8C
9.x
9.x.x
ACLs
AOS-CX
API
Expand Down
52 changes: 31 additions & 21 deletions troubleshooting/error_rolling_back_service_chart_with_etcd.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Error Rolling Back Service Chart With etcd

If rolling back a service with bitnami etcd, the `helm rollback` could fail when going from an etcd chart version 9.x.x to an etcd chart version 8.x.x. This is because the bitnami 9.x etcd cluster statefulset and pods have the `app.kubernetes.io/component=etcd` label and the bitnami 8.x etcd cluster statefulset and pods do not, causing the statefulset to complain on rollback.

If rolling back a service with Bitnami etcd, the `helm rollback` could fail when going from an etcd chart version 9.x to an etcd chart version 8.x.
This is because the Bitnami 9.x etcd cluster StatefulSet and Pods have the `app.kubernetes.io/component=etcd` label and the
Bitnami 8.x etcd cluster StatefulSet and Pods do not, causing the StatefulSet to complain on rollback.

## Prerequisites

Expand All @@ -14,15 +15,16 @@ Error: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet
You'll also see the error in the `helm history` output.

Example output:

```text
# helm history -n services cray-hms-hmnfd
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 deployed cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 deployed cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
```

If `helm.sh/chart` is `etcd-9.x.x` or greater and the statefulset and pods have the `app.kubernetes.io/component=etcd` label, you can get past the error by removing the offending label from the etcd cluster pods and statefulset.
If `helm.sh/chart` is `etcd-9.x.x` or greater and the StatefulSet and Pods have the `app.kubernetes.io/component=etcd` label, you can get past the error by removing the offending label from the etcd cluster Pods and StatefulSet.

1. (`ncn-m001`) Check which etcd helm chart version is currently running using `kubectl describe pod`.

Expand All @@ -31,32 +33,37 @@ If `helm.sh/chart` is `etcd-9.x.x` or greater and the statefulset and pods have
```

Example output:

```text
ncn-m001:~ # kubectl describe pod -n services cray-hmnfd-bitnami-etcd-0 | grep "helm.sh/chart"
helm.sh/chart=etcd-9.5.6
ncn-m001:~ #
```

1. (`ncn-m001`) Check that the statefulset and the pods have the `app.kubernetes.io/component` label.
1. (`ncn-m001`) Check that the StatefulSet and the pods have the `app.kubernetes.io/component` label.

(`ncn-m001`) Check if the StatefulSet has the label with this command:

(`ncn-m001`) Check if the statefulset has the label with this command:
```bash
kubectl get statefulset -n services -l app.kubernetes.io/component=etcd | grep hmnfd
```

Example output:

```text
ncn-m001:~ # kubectl get statefulset -n services -l app.kubernetes.io/component=etcd | grep hmnfd
cray-hmnfd-bitnami-etcd 3/3 26h
ncn-m001:~ #
```

(`ncn-m001`) Check if the pods have the label with this command:
(`ncn-m001`) Check if the Pods have the label with this command:

```bash
kubectl get pods -n services -l app.kubernetes.io/component=etcd | grep hmnfd
```

Example output:

```text
ncn-m001:~ # kubectl get pods -n services -l app.kubernetes.io/component=etcd | grep hmnfd
cray-hmnfd-bitnami-etcd-0 2/2 Running 0 33m
Expand All @@ -65,21 +72,22 @@ If `helm.sh/chart` is `etcd-9.x.x` or greater and the statefulset and pods have
ncn-m001:~ #
```


## Create a Manual Backup of the etcd Cluster

If a manual backup of the etcd cluster was not done prior to running `helm rollback`, do so now by following [Create a Manual Backup of a Healthy etcd Cluster](../operations/kubernetes/Create_a_Manual_Backup_of_a_Healthy_etcd_Cluster.md)


## Run Script to Remove Label

(`ncn-m001`) Run the script `remove_label_from_etcd_cluster.sh` in `/usr/share/doc/csm/troubleshooting/scripts/` to remove the `app.kubernetes.io/component=etcd` label from the statefulset and pods in the etcd cluster.
(`ncn-m001`) Run the script `remove_label_from_etcd_cluster.sh` in `/usr/share/doc/csm/troubleshooting/scripts/` to remove the `app.kubernetes.io/component=etcd` label from the StatefulSet and Pods in the etcd cluster.

1. (`ncn-m001`) Change directory to `/usr/share/doc/csm/troubleshooting/scripts/`.

```bash
cd /usr/share/doc/csm/troubleshooting/scripts
```

Usage output:

```text
ncn-m001:/usr/share/doc/csm/troubleshooting/scripts # ./remove_label_from_etcd_cluster.sh
Expand All @@ -95,11 +103,13 @@ If a manual backup of the etcd cluster was not done prior to running `helm rollb
```

1. (`ncn-m001`) Run `remove_label_from_etcd_cluster.sh`.

```bash
./remove_label_from_etcd_cluster.sh services cray-hmnfd
```

Example output:

```bash
ncn-m001:/usr/share/doc/csm/troubleshooting/scripts # ./remove_label_from_etcd_cluster.sh services cray-hmnfd
Ensuring cray-hmnfd-bitnami-etcd members do not have 'app.kubernetes.io/component=etcd' label for rollback to bitnami 8.x chart...
Expand All @@ -115,26 +125,26 @@ If a manual backup of the etcd cluster was not done prior to running `helm rollb
ncn-m001:/usr/share/doc/csm/troubleshooting/scripts #
```

The label `app.kubernetes.io/component=etcd` that was causing the statefulset error has been removed from the statefulset and etcd cluster pods. Re-running the `helm rollback` should now succeed.

The label `app.kubernetes.io/component=etcd` that was causing the StatefulSet error has been removed from the StatefulSet and etcd cluster Pods. Re-running the `helm rollback` should now succeed.

## Re-run `helm rollback`

(`ncn-m001`) With the label `app.kubernetes.io/component=etcd` removed from the etcd cluster pods and the statefulset, re-run `helm rollback`. The rollback should succeed and the expected revision should be running.
(`ncn-m001`) With the label `app.kubernetes.io/component=etcd` removed from the etcd cluster Pods and the StatefulSet, re-run `helm rollback`. The rollback should succeed and the expected revision should be running.

```bash
helm rollback -n services cray-hms-hmnfd 1
```

Example output:

```bash
ncn-m001:~ # helm rollback -n services cray-hms-hmnfd 1
Rollback was a success! Happy Helming!
ncn-m001:~ #
ncn-m001:~ # helm history -n services cray-hms-hmnfd
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 superseded cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
4 Fri Dec 20 20:06:38 2024 deployed cray-hms-hmnfd-3.0.2 1.18.1 Rollback to 1
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Mon Dec 9 21:00:24 2024 superseded cray-hms-hmnfd-3.0.2 1.18.1 Install complete
2 Thu Dec 19 13:53:27 2024 superseded cray-hms-hmnfd-4.0.4 1.21.0 Upgrade complete
3 Fri Dec 20 19:50:42 2024 failed cray-hms-hmnfd-3.0.2 1.18.1 Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
4 Fri Dec 20 20:06:38 2024 deployed cray-hms-hmnfd-3.0.2 1.18.1 Rollback to 1
```
68 changes: 34 additions & 34 deletions troubleshooting/scripts/remove_label_from_etcd_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#
# MIT License
#
# (C) Copyright 2024 Hewlett Packard Enterprise Development LP
# (C) Copyright 2025 Hewlett Packard Enterprise Development LP
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
Expand All @@ -24,10 +24,10 @@
#

usage() {
echo "
echo "
Usage:
$0 <namespace> <etcd-cluster>
$0 <namespace> <etcd-cluster>
<namespace> - The Kubernetes namespace the etcd cluster pods are running in.
Example, 'services'
Expand All @@ -37,8 +37,8 @@ $0 <namespace> <etcd-cluster>

# Print usage if there are less than two args
if [ "$#" -lt 2 ]; then
usage
exit 1
usage
exit 1
fi

ns=$1
Expand All @@ -49,37 +49,37 @@ label_value="etcd"
label="${label_key}=${label_value}"

if ! kubectl get endpoints ${cluster}-etcd-client -n ${ns} -o json > /dev/null 2>&1 ; then
#
# There is no old etcd-operator managed chart installed, let's see if this
# is a fresh install or upgrade with new label.
#
has_label=$(kubectl get statefulsets.apps -n ${ns} -l ${label} --no-headers 2>/dev/null | awk "/${ss_name}/")
if [ -z "$has_label" ]; then
#
# There is no old etcd-operator managed chart installed, let's see if this
# is a fresh install or upgrade with new label.
# The new label has not been applied, so no need to delete label before rollback.
#
has_label=$(kubectl get statefulsets.apps -n ${ns} -l ${label} --no-headers 2>/dev/null | awk "/${ss_name}/")
if [ -z "$has_label" ]; then
#
# The new label has not been applied, so no need to delete label before rollback.
#
echo "The '${label}' label has already been removed from ${ss_name}, continue with rollback."
exit 0
fi

#
# The new label has been applied, so we need to remove the label and delete the
# statefulset before rolling back.
#
members=$(kubectl get pod -n $ns -o wide -o=custom-columns=NAME:.metadata.name | awk "/${ss_name}/ && !/snapshotter|defrag/")
if [ -n "$members" ]; then
echo "Ensuring ${ss_name} members do not have '${label}' label for rollback to bitnami 8.x chart..."
for member in ${members}; do
echo "Removing '${label}' label from ${member}..."
kubectl label pod -n ${ns} ${member} ${label_key}-
done
echo "Removing label '${label}' from statefulset for ${ss_name}"
kubectl label statefulset -n ${ns} ${ss_name} ${label_key}-
echo "Label '${label}' was removed from pods and '${ss_name}' statefulset. Continue with rollback."
else
echo "No installations detected requiring special handling, continue with rollback."
fi
echo "The '${label}' label has already been removed from ${ss_name}, continue with rollback."
exit 0
fi

#
# The new label has been applied, so we need to remove the label and delete the
# statefulset before rolling back.
#
members=$(kubectl get pod -n $ns -o wide -o=custom-columns=NAME:.metadata.name | awk "/${ss_name}/ && !/snapshotter|defrag/")
if [ -n "$members" ]; then
echo "Ensuring ${ss_name} members do not have '${label}' label for rollback to bitnami 8.x chart..."
for member in ${members}; do
echo "Removing '${label}' label from ${member}..."
kubectl label pod -n ${ns} ${member} ${label_key}-
done
echo "Removing label '${label}' from statefulset for ${ss_name}"
kubectl label statefulset -n ${ns} ${ss_name} ${label_key}-
echo "Label '${label}' was removed from pods and '${ss_name}' statefulset. Continue with rollback."
else
echo "No installations detected requiring special handling, continue with rollback."
fi
exit 0
fi

echo "Nothing to do. Found ${cluster}-etcd-client, so an older version without the ${label} label is already running."
echo "Nothing to do. Found ${cluster}-etcd-client, so an older version without the ${label} label is already running."

0 comments on commit 9fdb1bb

Please sign in to comment.