Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in speculative execution #1086

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented Oct 9, 2024

This PR fixes linked bug by simplifying the logic inside the function.
The new logic always checks if the resolved task is the last one, and if so it returns, so it's not possible for the tasks to be exhausted without finishing the select!.

This PR also adds regression test for the issue. I verified that the test actually panics without the fix.

Fixes: #1085

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk requested a review from wprzytula October 9, 2024 16:42
Copy link

github-actions bot commented Oct 9, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 630f8c3

@Lorak-mmk
Copy link
Collaborator Author

Oh, it looks like passing only completed futures to select! without complete branch causes panic instead of deadlock - in that case it may be possible to write a test.

@Lorak-mmk Lorak-mmk marked this pull request as draft October 9, 2024 16:52
@Lorak-mmk Lorak-mmk removed the request for review from wprzytula October 9, 2024 16:52
This commit fixes a panic in
`scylla::transport::speculative_execution::execute`.

It is possible for all speculative fibers to finish without returning
from the function. In that situation `select!` macro will panic because
all futures passed to it are completed.

Consider the scenario where all executions return BrokenConnection:
we will only assign to last_error and never return.
There are more ways that this bug can happen, but this is the simplest
one to explain and reproduce.

This commit fixes the bug by simplifying the logic inside the function.
The new logic always checks if the resolved task is the last one, and if
so it returns, so it's not possible for executions to be exhausted
when calling `select!`.
@Lorak-mmk Lorak-mmk force-pushed the fix-speculative-execution-race branch from f44942d to a5f84c3 Compare October 9, 2024 18:05
@Lorak-mmk Lorak-mmk marked this pull request as ready for review October 9, 2024 18:05
@Lorak-mmk Lorak-mmk changed the title SE: Fix possible race condition Fix panic in speculative execution Oct 9, 2024
@wprzytula
Copy link
Collaborator

Oh, it looks like passing only completed futures to select! without complete branch causes panic instead of deadlock - in that case it may be possible to write a test.

Good that they designed select! this way. Deadlocks are sad and much harder to debug.

@Lorak-mmk Lorak-mmk force-pushed the fix-speculative-execution-race branch from a5f84c3 to e0607c8 Compare October 10, 2024 07:59
@Lorak-mmk
Copy link
Collaborator Author

I renamed the test, because the bug is a panic not a deadlock. PR is ready for review.

scylla/tests/integration/retries.rs Outdated Show resolved Hide resolved
scylla/tests/integration/retries.rs Outdated Show resolved Hide resolved
scylla/tests/integration/retries.rs Outdated Show resolved Hide resolved
scylla/tests/integration/retries.rs Outdated Show resolved Hide resolved
@Lorak-mmk Lorak-mmk force-pushed the fix-speculative-execution-race branch from e0607c8 to 630f8c3 Compare October 10, 2024 15:52
@Lorak-mmk
Copy link
Collaborator Author

Addressed review comments

Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that you noticed the bug! Congrats!

@Lorak-mmk Lorak-mmk merged commit afa91f2 into scylladb:main Oct 10, 2024
11 checks passed
@wprzytula wprzytula mentioned this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible panic in speculative exection
3 participants