Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix-118: Check UID before deleting PVC #122

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

srteam2020
Copy link
Contributor

@srteam2020 srteam2020 commented Jun 12, 2021

What this PR does:

  • Create a new label DatacenterUID -- the UID of the cassandra datacenter
  • Label DatacenterUID to each pvc and check whether labeled DatacenterUID is the same as the UID of the current datacenter (which has a deletion timestamp) before deleting the pvc

Which issue(s) this PR fixes:
Fixes #118

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

┆Issue is synchronized with this Jiraserver Task by Unito
┆Issue Number: K8SSAND-774
┆Priority: Medium

@srteam2020
Copy link
Contributor Author

Error: The process '/opt/hostedtoolcache/mage-action/1.11.0/x64/mage' failed with exit code 1
Is it a transient error?

@burmanm
Copy link
Contributor

burmanm commented Jun 14, 2021

No, it checks that the output of sdkGenerate does not modify anything (to guard against API changes). In this case it seems to be that dependencies have changed (go.sum it out of date). We should probably ignore go.sum in this test.

@burmanm
Copy link
Contributor

burmanm commented Nov 2, 2021

Do you still want to rebase this?

@srteam2020
Copy link
Contributor Author

@burmanm Thanks for the reminder! Yes I will rebase it this week

@srteam2020
Copy link
Contributor Author

@burmanm I just rebased and pushed again. The conflicts should be resolved now.

@burmanm burmanm self-requested a review November 2, 2021 23:00
@burmanm
Copy link
Contributor

burmanm commented Nov 3, 2021

Some unit tests should have been upgraded, but that's not the main problem. There's an issue with the update policy of PersistentVolumeClaim labels in Kubernetes. This code can work if the deployment is new, but if the original deployment was created using an older version of cass-operator, which did not set the UID - we can no longer delete the PVCs.

When doing the above scenario, deploying master, deploy storage, update cass-operator, delete cassdc, we end up with an error:

2021-11-03T11:42:35.708Z	ERROR	controllers.CassandraDatacenter	Failed to update CassandraDatacenter with removed finalizers	{"cassandradatacenter": "cass-operator/dc2", "requestNamespace": "cass-operator", "requestName": "dc2", "loopID": "fc083749-96f3-4b4b-aade-3f4c833250c9", "namespace": "cass-operator", "datacenterName": "dc2", "clusterName": "cluster2", "error": "Operation cannot be fulfilled on cassandradatacenters.cassandra.datastax.com \"dc2\": StorageError: invalid object, Code: 4, Key: /registry/cassandra.datastax.com/cassandradatacenters/cass-operator/dc2, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c4e2b26a-59b4-4ced-a55c-b91dc77516d0, UID in object meta: "}

The reason is that StatefulSet PVC labels were not updated, there's no "UID" in any of the existing ones and when you run the delete command, it can't match them. And none of the existing PVCs are deleted even when the StatefulSet itself was:

➜  cass-operator git:(fix-118) ✗ kubectl -n cass-operator get all
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/cass-operator-controller-manager-59b69d96f9-cqdh8   1/1     Running   0          8m41s

NAME                                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/cass-operator-webhook-service   ClusterIP   10.96.91.154   <none>        443/TCP   16m

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cass-operator-controller-manager   1/1     1            1           16m

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/cass-operator-controller-manager-59b69d96f9   1         1         1       8m41s
replicaset.apps/cass-operator-controller-manager-64bd4899cf   0         0         0       16m
➜  cass-operator git:(fix-118) ✗ kubectl -n cass-operator get pvc
NAME                                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-commitlogs-cluster2-dc2-r1-sts-0   Bound    pvc-f5266658-9577-4056-a1ba-2811943bcbf8   1Gi        RWO            standard       14m
server-data-cluster2-dc2-r1-sts-0            Bound    pvc-e7619f61-5ff8-413e-b68b-e21bad2716ee   1Gi        RWO            standard       14m
server-logs-cluster2-dc2-r1-sts-0            Bound    pvc-6dd0541c-3752-4fb3-9ecc-1ce4cbaf8819   1Gi        RWO            standard       14m
➜  cass-operator git:(fix-118) ✗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: No status
Development

Successfully merging this pull request may close these issues.

PVC can be deleted mistakenly when reading stale deletionTimestamp information
2 participants