Deploying Cass-operator and CassandraDataCenter on Power architecture #561

Sunidhi-Gaonkar1 · 2023-08-21T12:50:38Z

Hi Team , We are working on deploying cass-operator(v1.14.0) and CassandraDataCenter on Power architecture. We are able to deploy the operator successfully, while deploying the CassandraDataCenter the pod with 0th index terminates repeatedly while the pods with 1st and 2nd index are running fine.
We have installed the operator using Helm chart.

NAME                                  READY   STATUS        RESTARTS   AGE
pod/cass-operator-6fb8dffdb6-hl5nm    1/1     Running       0          4h38m
pod/cassandra-default-sts-0           1/2     Terminating   0          19s
pod/cassandra-default-sts-1           2/2     Running       0          4h37m
pod/cassandra-default-sts-2           2/2     Running       0          4h37m

NAME                                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)
        AGE
service/cass-operator-webhook-service               ClusterIP   172.30.17.183   <none>        443/TCP
        4h38m
service/cassandra-additional-seed-service   ClusterIP   None            <none>        <none>
        4h37m
service/cassandra-all-pods-service          ClusterIP   None            <none>        9042/TCP,8080/TCP,9103/TCP,9000/TCP            4h37m
service/cassandra-service                   ClusterIP   None            <none>        9042/TCP,9142/TCP,8080/TCP,9103/TCP,9000/TCP   4h37m
service/seed-service                        ClusterIP   None            <none>        <none>
        4h37m

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cass-operator   1/1     1            1           4h38m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/cass-operator-6fb8dffdb6   1         1         1       4h38m

NAME                                             READY   AGE
statefulset.apps/cassandra-default-sts   2/3     4h37m

Any pointers regarding this will be helpful, Thank you.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-18

The text was updated successfully, but these errors were encountered:

burmanm · 2023-08-22T06:26:18Z

There should be some sort of indication from Kubernetes what caused the termination of a pod. We have no special handling for killing a pod (especially with index 0). The only reason we would kill a pod is if the Cassandra container itself becomes stuck (as in, loses readiness).

Other than that, it would require rolling restart / decommission to start deleting pods. But in those cases, the order would be different.

I recommend checking the logs of containers to identify if there's a reason why Cassandra is failing, or if Kubernetes has reasons to delete the pod otherwise (like rescheduling).

Sunidhi-Gaonkar1 · 2023-08-22T12:57:46Z

Thank you for the pointer! The Cassandra logs have no error specifying why the container is failing, attaching the logs below for your reference.
cassandra-container-logs.txt

burmanm · 2023-08-22T14:12:35Z

The cassandra container's log could tell if the /drain endpoint was called (it's a shutdown hook for the pod) or if the shutdown came from other source. If it's the shutdown hook, then something should indicate why it was shutdown.

Did cass-operator logs have any indications? If it did kill the pod, it should log why.

Sunidhi-Gaonkar1 · 2023-08-23T12:48:47Z

I checked the cass-operator logs and found this error for the 0th index pod:

2023-08-23T11:18:34.200Z INFO client::callNodeMgmtEndpoint {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"cassandra","namespace":"k8ssandra"}, "namespace": "k8ssandra", "name": "cassandra", "reconcileID": "cb182f6e-a759-4ad2-811b-1e9ec3812cba"}
2023-08-23T11:18:34.200Z DEBUG events Starting Cassandra for pod cassandra-default-sts-0 {"type": "Normal", "object": {"kind":"CassandraDatacenter","namespace":"k8ssandra","name":"cassandra","uid":"21cca51b-d70f-47a3-8086-0ef3416cf6a4","apiVersion":"cassandra.datastax.com/v1beta1","resourceVersion":"2114359"}, "reason": "StartingCassandra"}
2023-08-23T11:18:34.215Z INFO Failed to start pod cassandra-default-sts-0, deleting it {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"cassandra","namespace":"k8ssandra"}, "namespace": "k8ssandra", "name": "cassandra", "reconcileID": "cb182f6e-a759-4ad2-811b-1e9ec3812cba", "reason": "StartingCassandra", "eventType": "Warning"}
2023-08-23T11:18:34.215Z ERROR controllers.CassandraDatacenter calculateReconciliationActions returned an error {"cassandradatacenter": "k8ssandra/cassandra", "requestNamespace": "k8ssandra", "requestName": "cassandra", "loopID": "94017e6a-a0ad-4314-a6e3-add6a7b32302", "error": "Post \"http://10.254.22.169:8080/api/v0/lifecycle/start\": dial tcp 10.254.22.169:8080: connect: connection refused"}
github.com/k8ssandra/cass-operator/controllers/cassandra.(*CassandraDatacenterReconciler).Reconcile
/workspace/controllers/cassandra/cassandradatacenter_controller.go:145
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
2023-08-23T11:18:34.215Z INFO Post "http://10.254.22.169:8080/api/v0/lifecycle/start": dial tcp 10.254.22.169:8080: connect: connection refused {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"cassandra","namespace":"k8ssandra"}, "namespace": "k8ssandra", "name": "cassandra", "reconcileID": "cb182f6e-a759-4ad2-811b-1e9ec3812cba", "reason": "ReconcileFailed", "eventType": "Warning"}
2023-08-23T11:18:34.215Z INFO controllers.CassandraDatacenter Reconcile loop completed {"cassandradatacenter": "k8ssandra/cassandra", "requestNamespace": "k8ssandra", "requestName": "cassandra", "loopID": "94017e6a-a0ad-4314-a6e3-add6a7b32302", "duration": 0.018262743}
2023-08-23T11:18:34.215Z ERROR Reconciler error {"controller": "cassandradatacenter_controller", "controllerGroup": "cassandra.datastax.com", "controllerKind": "CassandraDatacenter", "CassandraDatacenter": {"name":"cassandra","namespace":"k8ssandra"}, "namespace": "k8ssandra", "name": "cassandra", "reconcileID": "cb182f6e-a759-4ad2-811b-1e9ec3812cba", "error": "Post \"http://10.254.22.169:8080/api/v0/lifecycle/start\": dial tcp 10.254.22.169:8080: connect: connection refused"}

burmanm · 2023-08-23T14:11:53Z

There is the reason:

2023-08-23T11:18:34.215Z INFO Post "http://10.254.22.169:8080/api/v0/lifecycle/start": dial tcp 10.254.22.169:8080: connect: connection refused

The management-api could not be contacted for some reason (perhaps the cassandra-container logs would tell something, that includes management-api logs, server-system-logger container is the Cassandra itself).

adejanovski added this to K8ssandra Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying Cass-operator and CassandraDataCenter on Power architecture #561

Deploying Cass-operator and CassandraDataCenter on Power architecture #561

Sunidhi-Gaonkar1 commented Aug 21, 2023 •

edited by sync-by-unito bot

Loading

burmanm commented Aug 22, 2023

Sunidhi-Gaonkar1 commented Aug 22, 2023

burmanm commented Aug 22, 2023

Sunidhi-Gaonkar1 commented Aug 23, 2023

burmanm commented Aug 23, 2023

Deploying Cass-operator and CassandraDataCenter on Power architecture #561

Deploying Cass-operator and CassandraDataCenter on Power architecture #561

Comments

Sunidhi-Gaonkar1 commented Aug 21, 2023 • edited by sync-by-unito bot Loading

burmanm commented Aug 22, 2023

Sunidhi-Gaonkar1 commented Aug 22, 2023

burmanm commented Aug 22, 2023

Sunidhi-Gaonkar1 commented Aug 23, 2023

burmanm commented Aug 23, 2023

Sunidhi-Gaonkar1 commented Aug 21, 2023 •

edited by sync-by-unito bot

Loading