Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update FilePrefetchBuffer::Read to reuse file system buffer when possible #13118

Closed
wants to merge 31 commits into from

Conversation

archang19
Copy link
Contributor

@archang19 archang19 commented Nov 6, 2024

Summary

This PR adds support for reusing the file system provided buffer to avoid an extra memcpy into RockDB's buffer. This optimization has already been implemented for point lookups, as well as compaction and scan reads when prefetching is disabled.

This PR extends this optimization to work with synchronous prefetching (num_buffers == 1). Asynchronous prefetching can be addressed in a future PR (and probably should be to keep this PR from growing too large).

Remarks

  • To handle the case where the main buffer only has part of the requested data, I used the existing overlap_buf_ (currently used in the async prefetching case) instead of defining a separate buffer. This was discussed in Update FilePrefetchBuffer::Read to reuse file system buffer when possible #13118 (comment).
  • We use MultiRead with a single request to take advantage of the file system buffer. This is consistent with previous work (e.g. Provide support for FSBuffer for point lookups #12266).
  • Even without the tests I added, there was some code coverage inside in at least DBIOCorruptionTest.IterReadCorruptionRetry, since those tests were failing before I addressed a bug in my code for this PR. Run with failed test.
  • This prefetching code is not too easy to follow, so I added quite a bit of comments to both the code and test case to try to make it easier to understand the exact internal state of the prefetch buffer at every point in time.

Test Plan

I wrote pretty thorough unit tests that cover synchronous prefetching with file system buffer reuse. The flows for partial hits, complete hits, and complete misses are tested. I also parametrized the test to make sure the async prefetching (without file system buffer reuse) still work as expected.

Once we agree on the changes, I will run a long stress test before merging.

@archang19 archang19 changed the title Update FilePrefetchBuffer::Read to use FS buffer Update FilePrefetchBuffer::Read to use file system buffer Nov 6, 2024
@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

3 similar comments
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

6 similar comments
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@@ -379,12 +389,21 @@ class FilePrefetchBuffer {
void PrefetchAsyncCallback(FSReadRequest& req, void* cb_arg);

void TEST_GetBufferOffsetandSize(
std::vector<std::pair<uint64_t, size_t>>& buffer_info) {
std::vector<std::tuple<uint64_t, size_t, bool>>& buffer_info) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this third field to validate async prefetching, since otherwise I cannot distinguish between async_req_len and CurrentSize()

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@archang19
Copy link
Contributor Author

@anand1976 here is a summary of the changes since the last review, since the old comment thread seems to be hidden:

  1. Update FilePrefetchBuffer::Read to reuse file system buffer when possible #13118 (comment) why we can't refactor our assert to Append
  2. Update FilePrefetchBuffer::Read to reuse file system buffer when possible #13118 (comment) why we do not end up "double copying" to the overlap buffer + tests that validate this using sync points
  3. Update FilePrefetchBuffer::Read to reuse file system buffer when possible #13118 (comment) parametrized tests for both sync/async cases, with discussion on why the async testing is a bit tricky/annoying currently

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add an entry under unreleased_history/performance_improvements so release history gets update?

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 merged this pull request in 26b4806.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants