Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology: skip invalid peers instead of failing #1045

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

wprzytula
Copy link
Collaborator

@wprzytula wprzytula commented Jul 29, 2024

Motivation

It appears that in some clusters there sometimes emerges a transient state that there are some incomplete/invalid peer entries in system tables. Under such circumstances, the driver would throw an error and fail the metadata refresh altogether, eventually leading to critical conditions such as no hosts to query.

Solution

To fix this in line with how other drivers approach this, a warning is emitted upon invalid peer entry read, and the entry is skipped with no error returned.

Fixes: #1023
Fixes: #235

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Err(err) => {
warn!(
"system.peers or system.local has an invalid row, skipping it: {}",
err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would like to see #1021 here. IIUC this would most always return system.peers or system.local has invalid column type, no?

Either way, this LGTM as it addresses the skip part.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warn! prints a pretty message contained by err, so we should get the deepest possible explanation of what went wrong.

Copy link

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Lorak-mmk
Copy link
Collaborator

Let's wait with merging until CI issues are fixed (and then rebase)

@Lorak-mmk
Copy link
Collaborator

You can mark #235 as fixed by this.

@wprzytula wprzytula force-pushed the dont-fail-on-invalid-peer branch from fa2c2ed to 915feba Compare July 31, 2024 15:22
@wprzytula
Copy link
Collaborator Author

Rebased on main.

@wprzytula wprzytula self-assigned this Jul 31, 2024
It appears that in some clusters there sometimes emerges a transient
state that there are some incomplete/invalid peer entries in system
tables. Under such circumstances, the driver would throw an error and
fail the metadata refresh altogether, eventually leading to critical
conditions such as no hosts to query.
To fix this in line with how other drivers approach this, a warning is
emitted upon invalid peer entry read, and the entry is skipped with no
error returned.
@wprzytula wprzytula force-pushed the dont-fail-on-invalid-peer branch from 915feba to a282396 Compare July 31, 2024 15:33
Copy link

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: a282396

@Lorak-mmk Lorak-mmk merged commit a0e7856 into scylladb:main Aug 1, 2024
11 checks passed
@vponomaryov
Copy link

@wprzytula , @Lorak-mmk

Do you plan to make a new tag any time soon after merging this bugfix?

@wprzytula
Copy link
Collaborator Author

@wprzytula , @Lorak-mmk

Do you plan to make a new tag any time soon after merging this bugfix?

We are thinking of introducing some paging API changes soon and then releasing.

@wprzytula wprzytula deleted the dont-fail-on-invalid-peer branch August 1, 2024 08:26
wprzytula pushed a commit to wprzytula/scylla-rust-driver that referenced this pull request Aug 22, 2024
…peer

topology: skip invalid peers instead of failing
(cherry picked from commit a0e7856)
@wprzytula wprzytula mentioned this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't fail metadata fetch when encountering invalid peer Skip Invalid peers with a warning
4 participants