[Bug]: Kraft migration with AWS external loadbalancer gets stuck #10725
Replies: 3 comments 1 reply
-
The controller nodes do not do any load balancer registration. In fact, even brokers don't do anything like that. The error itself does not seem to make much sense as the pod is either not ready or we try to connect to it. But you should probably start by sharing full logs and configurations of all compoenents. |
Beta Was this translation helpful? Give feedback.
-
PS: If you are using something like this https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/ to bind the Pod readiness to the loadbalancers, than you should make sure it is not applied to the controller nodes as they do not access client connections through any loadbalancers. |
Beta Was this translation helpful? Give feedback.
-
Triaged on 17/10/2024: decided to convert into a discussion because it doesn't seem to be a Strimzi bug. |
Beta Was this translation helpful? Give feedback.
-
Bug Description
While triggering a migration from zookeeper to Kraft. The controller nodepool, newly created, never gets healthy. The strimzi operator keeps reporting timeout on newly created nodes.
This happens with a bootstrap config that registers the node to an AWS load balancer.
It appears that the controller node remains unhealthy because it can't become ready. I think it is due to it's registration to the load balancer that keep failing. Most likely the port 9092 is not listening, and so the
readiness
gate on the pod with thecontroller
role never becomes ready from the operator side.One fix i found was, in the kubernetes service handling the bootstrap description, to add in spec selector:
to ensure that only nodes with broker role will be included in the load balancer.
Once added, and pod restarted, the migration actually starts
Steps to reproduce
2 . start a kraft migration
Expected behavior
Expecting the migration to trigger as described in the doc
Strimzi version
0.43.0
Kubernetes version
EKS 1.30
Installation method
helm chart
Infrastructure
EKS
Configuration files and logs
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions