Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the datacenter name validation if provided #206

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

sylwiaszunejko
Copy link
Collaborator

Previously there was no check if DC name provided in the policy (DCAware and RackAware) is correct. That could lead to e.g. making routing decisions based on the wrong DC name.

This PR introduces failing to connect if DC name is different in the topology than the one the user entered.

Fixes: #205

session.go Outdated
datacenters = append(datacenters, host.DataCenter())
}
if !s.policy.IsDatacenterValid(datacenters) {
return nil, fmt.Errorf("gocql: unable to create session: datacenter provided in the policy is not valid")
Copy link

@mykaul mykaul Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("gocql: unable to create session: datacenter provided in the policy is not valid")
return nil, fmt.Errorf("gocql: unable to create session: datacenter provided in the policy was not found in the topology")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sylwiaszunejko sylwiaszunejko requested a review from mykaul July 4, 2024 12:25
policies.go Outdated
@@ -319,6 +319,8 @@ type HostSelectionPolicy interface {
// so it's safe to have internal state without additional synchronization as long as every call to Pick returns
// a different instance of NextHost.
Pick(ExecutableQuery) NextHost
// Checks if datacenter is valid if local aware policy is used (explicitly or as a fallback)
IsDatacenterValid(datacenters []string) bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me for harsh critic, but I think this API is confusing.
What it actually does is veryfies that policy is able to properly operate on given cluster.

And I would suggest to ammed API to respresent exactly that, handover newly created Session to a policy so that it could check it self if it can operate on it:

Suggested change
IsDatacenterValid(datacenters []string) bool
IsOperational(*Session) error

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it actually just performs as single validation (today)? It'd be great if in the future we could think and maybe add additional validation, but I'm not aware of any?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for 100% here, there is no other cases, at least for built-in policies, that is why my apeal only to API outlook, logic behind it and the way it may confuse future users.
@mykaul , WDYT considering all that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the name describes exactly what the function checks, so at the moment it's good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of making it future proof and just feed Session to a policy to let it self check what it needs to check ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against it, just because of time pressure, not because of correctness. I think the code right now is good enough, and we are under some pressure to deliver it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change can be done at the same time.
(I also don't think the original issue put any time pressure here).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is not that urgent - this is a footgun, not something that needs immediate fix.
Changing the interface now is easy, changing it later is not because of backwards compatibility.

Copy link

@Lorak-mmk Lorak-mmk Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should verify rack name too, right? Maybe a more general mechanism that can be extended later would be better.

Copy link
Collaborator

@dkropachev dkropachev Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mykaul , I see that 4 people are for changing API to <>(s *Session) error ? is it ok with you to go that way ?
We can also cover rack validation same way

policies.go Outdated Show resolved Hide resolved
@roydahan roydahan requested review from vponomaryov and fruch July 4, 2024 13:15
@sylwiaszunejko sylwiaszunejko force-pushed the dc_name_validation branch 2 times, most recently from d8d02d6 to 7182b55 Compare July 4, 2024 14:34
session.go Outdated
if err != nil {
return nil, fmt.Errorf("gocql: unable to create session: %v", err)
}
for _, host := range hosts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please filter them out by s.cfg.filterHost(h)

@sylwiaszunejko
Copy link
Collaborator Author

@mykaul @Lorak-mmk
I pushed the new version with IsOperational(*Session) error API and additional check for rack. Is it what you had in mind @dkropachev ?

Copy link
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit picks

policies.go Outdated
@@ -319,6 +319,8 @@ type HostSelectionPolicy interface {
// so it's safe to have internal state without additional synchronization as long as every call to Pick returns
// a different instance of NextHost.
Pick(ExecutableQuery) NextHost
// Checks if session is operational with the cluster on Session initialization, not when you actually run the query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Checks if session is operational with the cluster on Session initialization, not when you actually run the query
// IsOperational checks if host policy can properly work with given Session/Cluster/ClusterConfig

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

policies.go Outdated
Comment on lines 996 to 1014
if !session.cfg.disableInit && !session.cfg.disableControlConn {
var datacenters []string
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) {
datacenters = append(datacenters, host.DataCenter())
}
}

found := false

for _, dc := range datacenters {
if dc == d.local {
found = true
break
}
}

if !found {
return fmt.Errorf("gocql: datacenter provided in the policy was not found in the topology")
}
}

return nil

}
Copy link
Collaborator

@dkropachev dkropachev Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylwiaszunejko , it is a nit, not necessary to follow, but early return is a realy good practice:

Suggested change
if !session.cfg.disableInit && !session.cfg.disableControlConn {
var datacenters []string
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) {
datacenters = append(datacenters, host.DataCenter())
}
}
found := false
for _, dc := range datacenters {
if dc == d.local {
found = true
break
}
}
if !found {
return fmt.Errorf("gocql: datacenter provided in the policy was not found in the topology")
}
}
return nil
}
if session.cfg.disableInit || session.cfg.disableControlConn {
return nil
}
var datacenters []string
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) && host.DataCenter() == d.local {
// Policy can work properly only if there is at least one host from target DC
// No need to check host status, since it could be down due to the outage
// We only need to make sure that policy is not misconfigured with wrong DC
return nil
}
}
return fmt.Errorf("gocql: dcAwarePolicy: datacenter %s was not found in the topology", d.local)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

policies.go Outdated
Comment on lines 1131 to 1137
if !session.cfg.disableInit && !session.cfg.disableControlConn {
var datacenters []string
var racks []string
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) {
datacenters = append(datacenters, host.DataCenter())
racks = append(racks, host.Rack())
}
}

foundDC := false
for _, dc := range datacenters {
if dc == d.localDC {
foundDC = true
break
}
}

foundRack := false
for _, rack := range racks {
if rack == d.localRack {
foundRack = true
break
}
}

if !foundDC {
return fmt.Errorf("gocql: datacenter provided in the policy was not found in the topology")
}
if !foundRack {
return fmt.Errorf("gocql: rack provided in the policy was not found in the topology")
}
}

return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !session.cfg.disableInit && !session.cfg.disableControlConn {
var datacenters []string
var racks []string
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) {
datacenters = append(datacenters, host.DataCenter())
racks = append(racks, host.Rack())
}
}
foundDC := false
for _, dc := range datacenters {
if dc == d.localDC {
foundDC = true
break
}
}
foundRack := false
for _, rack := range racks {
if rack == d.localRack {
foundRack = true
break
}
}
if !foundDC {
return fmt.Errorf("gocql: datacenter provided in the policy was not found in the topology")
}
if !foundRack {
return fmt.Errorf("gocql: rack provided in the policy was not found in the topology")
}
}
return nil
}
if session.cfg.disableInit || session.cfg.disableControlConn {
return nil
}
hosts, _, err := session.hostSource.GetHosts()
if err != nil {
return fmt.Errorf("gocql: unable to check if session is operational: %v", err)
}
for _, host := range hosts {
if !session.cfg.filterHost(host) && host.DataCenter() == d.localDC && host.Rack() == d.localRack {
// Policy can work properly only if there is at least one host from target DC+Rack
// No need to check host status, since it could be down due to the outage
// We only need to make sure that policy is not misconfigured with wrong DC+Rack
return nil
}
}
return fmt.Errorf("gocql: rack %s/%s was not found in the topology", d.localDC, d.localRack)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

wprzytula
wprzytula previously approved these changes Jul 5, 2024
policies.go Outdated
Comment on lines 1012 to 1013

return fmt.Errorf("gocql: datacenter %s in the policy was not found in the topology", d.local)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's add a hint what's the reason? "... - probable DC aware policy misconfiguration"

policies.go Outdated
return nil
}
}
return fmt.Errorf("gocql: rack %s/%s was not found in the topology", d.localDC, d.localRack)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Fail to connect if DC or rack name is different in the topology
than the one the user entered.
@sylwiaszunejko sylwiaszunejko merged commit ed9f13a into scylladb:master Jul 5, 2024
1 check passed
@wprzytula wprzytula self-assigned this Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Driver should fail to connect if DC name is different in the topology than the one the user entered
6 participants