-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The k8ssandra-operator mutating webhook should be restricted to cass-operator managed pods #1172
Comments
Hi @vcanuel, very sorry about this. You can resolve this by deleting the mutating webhook manually in order to let the cluster restart. You should have something like this:
And then you can delete the k8ssandra-operator mutating webhook with:
Re-installing the operator once the cluster is back up and running should recreate the webhook. |
I think the fix for us should be to change the failure policy from @vcanuel, you can probably try this yourself by editing the webhook instead of deleting it. |
Thanks for your quick response. I have restored the cluster from an earlier snapshot, as this occurred in our production environment. I will keep your advice as a reference in case this issue arises again. There's still a lot for me to learn about Kubernetes :). |
We'll have that fixed in our next release which is planned for the beginning of February at most. |
What happened?
After an automatic upgrade of my Kubernetes cluster on Google Cloud Platform (GCP), I encountered connectivity issues with the k8ssandra-operator-webhook-service. This resulted in numerous deployment failures, including the metrics-server, leading to significant instability in my cluster. I observed the following error messages for every deployment in the cluster:
Internal error occurred: failed calling webhook "mpod.kb.io": failed to call webhook: Post "https://k8ssandra-operator-webhook-service.k8ssandra-operator.svc:443/mutate-v1-pod-secrets-inject?timeout=10s": no endpoints available for service "k8ssandra-operator-webhook-service"
Internal error occurred: failed calling webhook "mpod.kb.io": failed to call webhook: Post "https://k8ssandra-operator-webhook-service.k8ssandra-operator.svc:443/mutate-v1-pod-secrets-inject?timeout=10s": No agent available
Despite restarting the operator, the problem continues.
Did you expect to see something different?
Yes, post-upgrade, I expected the cluster to remain stable with all services functioning correctly, including the webhook service. Essential deployments, especially the metrics-server, were expected to launch without issues.
How to reproduce it (as minimally and precisely as possible):
Update from 1.27.3-gke.100 to 1.27.7-gke.1056000
The Kubernetes cluster undergoes an automatic upgrade on GCP.
Post-upgrade, observe the behavior of k8ssandra-operator-webhook-service and the launching of deployments.
Environment
K8ssandra Operator version:
1.11.0
Kubernetes version information:
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.7-gke.1056000
Kubernetes cluster kind:
Google Cloud Platform (GCP) managed Kubernetes cluster.
Manifests:
K8ssandra Operator Logs:
Anything else I need to know?:
The issue has led to a large number of deployment failures and significantly impacted the stability of my Kubernetes environment. I am seeking insights or guidance on resolving these post-upgrade issues.
The text was updated successfully, but these errors were encountered: