Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if localHost is not zero token node #333

Merged
merged 2 commits into from
Nov 15, 2024

Conversation

sylwiaszunejko
Copy link
Collaborator

To ensure that driver handles zero-token nodes properly we need to make sure that following scenarios work as intended:
datacenter2 is a zero-token datacenter
target host - host you feed to NewCluster
target dc - dc name you feed to DCAwareRoundRobinPolicy

  1. target host = any host from datacenter1, target dc = datacenter1. It should succeed, you should be able to execute queries
  2. target host = any host from datacenter2, target dc = datacenter1. It should succeed, you should be able to execute queries
  3. target host = any host from datacenter1, target dc = datacenter2. It should fail with error
  4. target host = any host from datacenter2, target dc = datacenter2. It should fail with error

This PR is needed for scenario 4 to work properly.

@sylwiaszunejko
Copy link
Collaborator Author

I would like to add a unit test to test all scenarios, but I have some issues with that. The test should look like this:

hosts := [...]*HostInfo{
	{hostId: "0", connectAddress: net.ParseIP("10.0.0.1"), dataCenter: "datacenter1", tokens: []string{"0", "1"}},
	{hostId: "1", connectAddress: net.ParseIP("10.0.0.2"), dataCenter: "datacenter2", tokens: []string{}},
}
cluster := NewCluster(string(hosts[0].connectAddress)) // 0 or 1 depenting on the actual scenerio we want to test
cluster.PoolConfig.HostSelectionPolicy = DCAwareRoundRobinPolicy("datacenter2") // datacenter1 or datacenter2

session, err := cluster.CreateSession()
assertEqual(t, "err is nil", nil, err)
session.Close()

This would work as a integration test, but not as a unit test because we need to initialize Session.
The key factor to test is IsOperational function of policy

gocql/policies.go

Lines 1027 to 1046 in aa6c8dc

func (d *dcAwareRR) IsOperational(session *Session) error {
if session.cfg.disableInit || session.cfg.disableControlConn {
return nil
}
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) && host.DataCenter() == d.local {
// Policy can work properly only if there is at least one host from target DC
// No need to check host status, since it could be down due to the outage
// We only need to make sure that policy is not misconfigured with wrong DC
return nil
}
}
return fmt.Errorf("gocql: datacenter %s in the policy was not found in the topology - probable DC aware policy misconfiguration", d.local)
}
and whether or not GetHosts() returns only non zero-token nodes. To do that I need to to somehow mock querySystemLocal/Peers, but use the real GetHosts() and getClusterPeerInfo(localHost) of ringDescriber function to test if the code properly omits zero-token nodes. I am not sure how to do that.
@roydahan @dkropachev If you have any suggestions how to test it using unit tests that would be great, or maybe it is acceptable to have integration test?

Copy link
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it is perfect.
Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?

Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

@dkropachev dkropachev merged commit 1e181f3 into scylladb:master Nov 15, 2024
2 checks passed
@sylwiaszunejko
Copy link
Collaborator Author

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?

Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264
I am not sure I understand why would we lose events if cc is on zero-token node.

@dkropachev
Copy link
Collaborator

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?
Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

@sylwiaszunejko
Copy link
Collaborator Author

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?
Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

Do you have specific scenario in mind when this would be helpful?

@dkropachev
Copy link
Collaborator

Thanks, it is perfect. Btw, looking at this PR, made me think about control connection, I wonder if it is good idea to make control connection to reconnect from zero token node to a regular one when it is possible ? WDYT ?
Say there is token aware policy with round robin as fallback (or dc aware), cc is landed on zero token dc first, is it possible that we will loose events from scylla if it stays there ? would it be better to force it to reconnect when other hosts available ?

Do you mean modifying controlConn connect function to first try to connect regular hosts and then zero-token ones? https://github.com/scylladb/gocql/blob/master/control.go#L226-L264 I am not sure I understand why would we lose events if cc is on zero-token node.

this one too, but I am more about actively drop current connection when non zero-token nodes available.

Do you have specific scenario in mind when this would be helpful?

Only if zero-token nodes are not getting some server events, which is unlikely, but we need to make sure it is not happening, I am worry about SCHEMA_CHANGE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants