Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(client): validate all known_hosts on ScyllaAPI client creation #3716

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

karol-kokoszka
Copy link
Collaborator

@karol-kokoszka karol-kokoszka commented Feb 14, 2024

Fixes #3707

This PR changes how the Scylla API client is created.

Previously, as Scylla Manager stores all known hosts in the DB, it was passing all of them to the API client initially upon creation. The client then called one of the endpoints to discover available hosts. This approach could lead to non-existing IPs being passed during client creation, and failure when calling the specified endpoint would later indicate that the host doesn't exist.

The client created at this first step is expected to call the Scylla API to retrieve information about existing/available nodes and their IPs, and update this information in the DB. To avoid a situation where the client selects a non-existing IP from the pool, this PR iterates through all known hosts and creates the initial client with just a single IP. If calling the discover hosts endpoint fails, it proceeds to the next host.

The host used when adding the cluster to Scylla Manager is included in the hosts that will be checked against the discover hosts endpoint.

Scylla Manager caches Scylla API clients and invalidates them after 15 minutes. This PR makes the timeout configurable and includes it as part of the scylla-manager.yaml configuration. It defaults to 15 minutes.


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch 2 times, most recently from 0ea281b to 3ecbc9d Compare February 15, 2024 16:31
@karol-kokoszka karol-kokoszka marked this pull request as ready for review February 15, 2024 16:35
@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch 2 times, most recently from ad9d8f3 to 9041bb5 Compare February 16, 2024 10:38
@Michal-Leszczynski
Copy link
Collaborator

It seems that some tests are failing.

@karol-kokoszka
Copy link
Collaborator Author

It seems that some tests are failing.

Yes, and these are weird failures.
Tests are passing, but the exit code is 2.

All are passing when executed locally.
Will trigger them once again.
It may be that splitting Sanity-Check into smaller chunks will help.

@karol-kokoszka
Copy link
Collaborator Author

OK, it fails on IPV6 cluster integration tests actually. I havent check it locally...

@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch from 9041bb5 to 494e620 Compare February 16, 2024 14:32
@karol-kokoszka
Copy link
Collaborator Author

It seems that some tests are failing.

Fixed. Test was using IP from IPv4 range directly.

@Michal-Leszczynski
Copy link
Collaborator

I just added one small comment, but apart from that it looks good!

@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch 2 times, most recently from b105417 to d98f159 Compare February 21, 2024 12:35
@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch from d98f159 to 5d97d6a Compare February 21, 2024 13:53
pkg/scyllaclient/provider.go Outdated Show resolved Hide resolved
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
pkg/scyllaclient/context.go Show resolved Hide resolved
pkg/service/cluster/service.go Outdated Show resolved Hide resolved
The fix for #3707 includes extending the cluster kept in the DB with the 'host'
property, which represents the initial host used when adding a cluster to Scylla Manager. Scylla-operator passes the DNS name here,
making it immune to the ephemeral IPs in the Kubernetes environment. This commit adds the initial host as the first host to query
in order to discover node IPs.
@karol-kokoszka karol-kokoszka force-pushed the kk/3707-add-support-for-hostid-node-identities branch from 5d97d6a to 37d7410 Compare February 21, 2024 15:48
@karol-kokoszka karol-kokoszka merged commit c991f13 into master Feb 22, 2024
19 of 21 checks passed
@karol-kokoszka karol-kokoszka deleted the kk/3707-add-support-for-hostid-node-identities branch February 22, 2024 10:08
This was referenced Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for HostID-based node identities
2 participants