Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the operator handle CASSANDRA-17883? #539

Open
rhuffy opened this issue Jun 14, 2023 · 2 comments
Open

Does the operator handle CASSANDRA-17883? #539

rhuffy opened this issue Jun 14, 2023 · 2 comments
Labels
assess Issues in the state 'assess' bug Something isn't working

Comments

@rhuffy
Copy link

rhuffy commented Jun 14, 2023

What happened?

While reading through open Cassandra issues, I came across CASSANDRA-17883. The issue is that, when a C* node is removed, its IP address gets added to a list of ignoredEndpoints in MigrationCoordinator. In the C* source, there is a TODO comment that describes the issue:

        // TODO The endpoint address is now ignored but when a node with the same address is added again later,
        //  there will be no way to include it in schema synchronization other than restarting each other node
        //  see https://issues.apache.org/jira/browse/CASSANDRA-17883 for details

When a pod bounces and comes up with a different IP, the old IP is removed from gossip, and I believe it's also added to ignoredEndpoints. If another pod bounces and gets that original IP, my concern is that any schema changes on that node will be ignored by the rest of the cluster.

Does the operator do anything to handle this situation?

What did you expect to happen?

No response

How can we reproduce it (as minimally and precisely as possible)?

I don't have a repro on a test k8s cluster since I'm not sure how to force pods to come up with particular IPs.

You can, however, reproduce in Cassandra dtests with these steps

  1. Create a 3 node cluster (127.0.0.1, 127.0.0.2, 127.0.0.3)
  2. Stop node1
  3. Stop node2, change its IP to 127.0.0.1 and start
  4. Create a keyspace on node2.
  5. Assert that node3 receives that schema change

Note that if node1 is restarted with some new IP, it will receive the schema change from node2, and pass it along to node3.

cass-operator version

1.15.0

Kubernetes version

1.24

Method of installation

No response

Anything else we need to know?

No response

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-22

@rhuffy rhuffy added the bug Something isn't working label Jun 14, 2023
@burmanm
Copy link
Contributor

burmanm commented Jun 19, 2023

I assume this is the same as #130 ?

@adejanovski
Copy link
Contributor

@burmanm, it seems like a different (although somewhat related) issue.
Here the nodes won't refuse to start, which is apparently what's described in #130.
I'm not sure how the operator could detect that 🤔 The other nodes are the ones ignoring the node that inherited an old IP, so that node cannot tell (or can it?) that it's getting ignored.
Unless we can detect some schema update failures in the mgmt-api and bounce the node so that it gets a new IP?

@adejanovski adejanovski moved this to Assess/Investigate in K8ssandra Jun 19, 2023
@adejanovski adejanovski added the assess Issues in the state 'assess' label Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess Issues in the state 'assess' bug Something isn't working
Projects
No open projects
Status: Assess/Investigate
Development

No branches or pull requests

3 participants