Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMPET-6863: Document how to get past a statefulset error when doing a helm rollback #5619

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

studenym-hpe
Copy link
Contributor

Description

CASMPET-6863: Document how to get past a statefulset error when doing a helm rollback from a service chart with etcd bitnami 9.x to a chart with etcd bitnami 8.x.

If rolling back a service with bitnami etcd, the helm rollback could fail when going from an etcd chart version 9.x.x to an etcd chart version 8.x.x. This is because the bitnami 9.x etcd cluster statefulset and pods have the app.kubernetes.io/component=etcd label and the bitnami 8.x etcd cluster statefulset and pods do not, causing the statefulset to complain on rollback.

This documents how to determine if the bitnami label mismatch is the cause of the helm rollback failure and how to run the new script remove_label_from_etcd_cluster.sh. The script was taken and modified from the etcd-base-chart pre-upgrade hook to remove the label rather than add the label.

Testing

Did a bunch of testing on beau with the cray-etcd-test chart contained in cray-etcd repo. Installed via helm a cray-etcd-test chart using etcd chart version 8.9.0 and then upgraded to a cray-etcd-test chart using version 9.5.6. Once upgrade complete, ran helm rollback to the previous version to recreate the statefulset error. Ran the remove_label_from_etcd_cluster.sh script and then re-ran helm rollback successfully.

# helm history -n services cray-etcd-test
REVISION	UPDATED                 	STATUS    	CHART                                      	APP VERSION    	DESCRIPTION
1       	Fri Dec 20 19:25:52 2024	superseded	cray-etcd-test-1.0.2-20241218164554+fa6aa14	1.0.0          	Install complete
2       	Fri Dec 20 19:38:54 2024	superseded	cray-etcd-test-1.0.2                       	1.0.0          	Upgrade complete
3       	Fri Dec 20 19:50:42 2024	failed    	cray-etcd-test-1.0.2-20241218164554+fa6aa14	1.0.0          	Rollback "cray-etcd-test" failed: cannot patch "cray-etcd-test-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-etcd-test-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
4       	Fri Dec 20 20:06:38 2024	deployed  	cray-etcd-test-1.0.2-20241218164554+fa6aa14	1.0.0          	Rollback to 1

Also, installed cray-hms-hmnfd-3.0.2 chart on beau which already was running cray-hms-hmnfd-4.0.4 and did helm rollbacks between the versions to recreate the error, run the script, and have the helm rollback succeed.

# helm history -n services cray-hms-hmnfd
REVISION	UPDATED                 	STATUS    	CHART               	APP VERSION	DESCRIPTION
<snip>
6       	Thu Dec 19 19:51:15 2024	failed    	cray-hms-hmnfd-3.0.2	1.18.1     	Upgrade "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
7       	Thu Dec 19 19:53:51 2024	superseded	cray-hms-hmnfd-3.0.2	1.18.1     	Upgrade complete
8       	Thu Dec 19 19:58:03 2024	superseded	cray-hms-hmnfd-4.0.4	1.21.0     	Upgrade complete
9       	Fri Dec 20 21:00:44 2024	failed    	cray-hms-hmnfd-3.0.2	1.18.1     	Rollback "cray-hms-hmnfd" failed: cannot patch "cray-hmnfd-bitnami-etcd" with kind StatefulSet: StatefulSet.apps "cray-hmnfd-bitnami-etcd" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
10      	Fri Dec 20 21:13:16 2024	superseded	cray-hms-hmnfd-3.0.2	1.18.1     	Rollback to 7
11      	Fri Dec 20 21:27:50 2024	deployed  	cray-hms-hmnfd-4.0.4	1.21.0     	Rollback to 1

Checklist

  • If I added any command snippets, the steps they belong to follow the prompt conventions (see example).
  • If I added a new directory, I also updated .github/CODEOWNERS with the corresponding team in Cray-HPE.
  • My commits or Pull-Request Title contain my JIRA information, or I do not have a JIRA.

Copy link
Contributor

@bo-quan bo-quan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

@davidfluck-hpe davidfluck-hpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpicky question; otherwise, great work!

troubleshooting/scripts/remove_label_from_etcd_cluster.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@leliasen-hpe leliasen-hpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving PR with a few minor adjustments.

… a helm rollback from a service chart with etcd bitnami 9.x to a chart with etcd bitnami 8.x.
@studenym-hpe studenym-hpe force-pushed the CASMPET-6863-rollback-issues branch from 85d25cb to f607338 Compare January 6, 2025 19:59
@mtupitsyn mtupitsyn merged commit 6705f95 into release/1.6 Jan 6, 2025
8 checks passed
@mtupitsyn mtupitsyn deleted the CASMPET-6863-rollback-issues branch January 6, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants