Add support for service discovery on Kubernetes #129

jocke-l · 2024-10-25T16:20:53Z

As part of a work assignment I added support for updating the proxy topology at runtime based on Kubernetes' EndpointSlices.

Even though this have not been discussed in any issue previously, I thought I'd just create the PR directly since most of the work is already done.

This functionality will also simplify #89.

How it works

ZDM_PROXY_TOPOLOGY_KUBERNETES_SERVICE acts as a feature flag. When active (it exists and is not empty), it should contain the name of a headless service that points to all the zdm-proxy pods. This service must be created in the same namespace as the zdm-proxy pods.

zdm-proxy will then continuously watch all EndpointSlices associated with this service and broadcast events to all clients. This is done by creating and starting TopologyRegistry on the ZdmProxy instance.

zdm-proxy partially intercepts REGISTER messages from the clients in order to keep track of which events the clients subscribes to. The relevant event types are TOPOLOGY_CHANGE and STATUS_CHANGE.

A separate goroutine per ClientHandler continuously receives events from the watcher and translates them to Cassandra Native Protocol event frames, which are then sent back to the client if the client had the event type registered.

Each ControlConnection also subscribes to the TopologyRegistry in order to recalculate the contents of ControlConnection.virtualHosts which, when this feature is enabled, will be based on the endpoints of each EndpointSlice.

What is missing

Currently there are no automatic tests for this feature, this is mainly because these would require a Kubernetes cluster. While it is possible to create a locally running Kubernetes cluster, or a mocked apiserver, similar to how the kubebuilder project does it, I have not given this much thought yet.

The code could also quite easily be generalized to support other service discovery mechanisms in the future, such as Consul.

lukasz-antoniak · 2024-10-30T13:47:35Z

Interesting feature. Few unrelated ideas:

I would add a generic interface for peers discovery and add static implementation that relies on configuration string & index number, and Kubernetes service implementation.
Should we add a debouncer like in most CQL drivers to accumulate add & remove events and be resilient to short connectivity issues.

lukasz-antoniak · 2024-10-30T13:48:12Z

proxy/pkg/kubernetes/topology.go

+func (tr *TopologyRegistry) runInformer(ctx context.Context) {
+	informerFactory := informers.NewFilteredSharedInformerFactory(
+		tr.clientset,
+		5*time.Minute,


I guess we should make refresh period configurable.

Yes, maybe. But this doesn't affect how often the topology is updated, but rather how often the informer refreshes its internal cache of kubernetes resources. The interval should be determined by trade-off between impact of a stale cache and load on the kubernets API. 5 minutes seems to be standard for most usecases.

jocke-l · 2024-11-02T20:26:55Z

@lukasz-antoniak

* I would add a generic interface for peers discovery and add static implementation that relies on configuration string & index number, and Kubernetes service implementation.

I agree. I will work on this when I have time.

* Should we add a debouncer like in most CQL drivers to accumulate add & remove events and be resilient to short connectivity issues.

Is this really necessary? EndpointSlices are already based on running pods and any readiness probes. If these are configured correctly, the debouncer functionality is already covered.

Add support for service discovery on Kubernetes

8f696d2

jocke-l requested review from joao-r-reis, alicel, grighetto, absurdfarce, weideng1 and lukasz-antoniak as code owners October 25, 2024 16:20

Index need to match the entry for the local node in cc.virtualHosts

08109c9

lukasz-antoniak reviewed Oct 30, 2024

View reviewed changes

jocke-l closed this Nov 2, 2024

jocke-l reopened this Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for service discovery on Kubernetes #129

Add support for service discovery on Kubernetes #129

jocke-l commented Oct 25, 2024

lukasz-antoniak commented Oct 30, 2024 •

edited

Loading

lukasz-antoniak Oct 30, 2024

jocke-l Oct 30, 2024

jocke-l commented Nov 2, 2024 •

edited

Loading

Add support for service discovery on Kubernetes #129

Are you sure you want to change the base?

Add support for service discovery on Kubernetes #129

Conversation

jocke-l commented Oct 25, 2024

How it works

What is missing

lukasz-antoniak commented Oct 30, 2024 • edited Loading

lukasz-antoniak Oct 30, 2024

Choose a reason for hiding this comment

jocke-l Oct 30, 2024

Choose a reason for hiding this comment

jocke-l commented Nov 2, 2024 • edited Loading

lukasz-antoniak commented Oct 30, 2024 •

edited

Loading

jocke-l commented Nov 2, 2024 •

edited

Loading