-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: make BufferedWriter
do double-buffering
#9693
Conversation
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
7018 tests run: 6710 passed, 0 failed, 308 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
1da4028 at 2024-12-03T15:31:43.605Z :recycle: |
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
…rsion) Signed-off-by: Yuchen Liang <[email protected]>
BufferedWriter
do double-buffering
We have that pagebench sub-benchmark for on-demand downloads, you could compare CPU usage before and after this change. But, might be faster to "just" address this TODO. Maybe you can be generic over constraints on the buffer type by making the buffer type an associated type of the writer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, again on write_buffered
/ write_bufferd_borrowed
.
Let's call it "buffer bypass for aligned parts of the write".
I remembered that with O_DIRECT, we save one memcpy so we can spend one and come out net 0 wrt CPU efficiency.
It would be nice to have a CPU efficiency WIN, though I'm ok with net 0.
The only remaining CPU efficiency difference that I can think of right now is that write_buffered
issues one giant write for the entire middle of the buffer, whereas write_buffered issues TAIL_SZ'd writes.
Left a couple of comments that need addressing. Let's discuss major unclarities on Slack.
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
## Problem The newly added flush task in #9693 should hold timeline gate open, to avoid doing local IO after timeline shutdown completes. ## Solution Pass timeline gate guard to flush background task. The flush task do not need cancellation token b/c it will automatically shutdown when the front writer task drop the channel. - Refactor relevant paths to pass down `&Gate` instead of `GateGuard`. Signed-off-by: Yuchen Liang <[email protected]>
pageserver/src/virtual_file/owned_buffers_io/aligned_buffer/buffer_mut.rs
Show resolved
Hide resolved
Signed-off-by: Yuchen Liang <[email protected]>
panics if IoBufferMut does not enough capacity left to accomodate the source buffer. Signed-off-by: Yuchen Liang <[email protected]>
consider cases where offset != 0 Signed-off-by: Yuchen Liang <[email protected]>
Co-authored-by: Christian Schwarz <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see how this works in staging. Preprod deployment later this week.
## Problem In #9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <[email protected]>
Closes #9387. ## Problem `BufferedWriter` cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387. ## Summary of changes - Maintain two owned buffers in `BufferedWriter`. - The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task. - The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse. - The writer and the flush background task communicate through a bi-directional channel. For in-memory layer, we also need to be able to read from the buffered writer in `get_values_reconstruct_data`. To handle this case, we did the following - Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`, and use `Arc` to share it between writer and background task. - leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke `IoBuffer::into_mut`, which gives us back the mutable aligned buffer. - InMemoryLayer reads is now aware of the maybe_flushed part of the buffer. **Caveat** - We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, [`download_object`](https://github.com/neondatabase/neon/blob/a439d57050dafd603d24e001215213eb5246a029/pageserver/src/tenant/remote_timeline_client/download.rs#L243) does one more memcpy than before this PR due to the switch to use `_borrowed` version of the write. - "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy. **Testing** - use an oneshot channel based control mechanism to make flush behavior deterministic in test. - test reading from `EphemeralFile` when the last submitted buffer is not flushed, in-progress, and done flushing to disk. ## Performance We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency. [Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Yuchen Liang <[email protected]> Co-authored-by: Christian Schwarz <[email protected]>
## Problem In #9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <[email protected]>
Closes #9387.
Problem
BufferedWriter
cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387.Summary of changes
BufferedWriter
.For in-memory layer, we also need to be able to read from the buffered writer in
get_values_reconstruct_data
. To handle this case, we did the followingVirtualFile::write_all
withVirtualFile::write_all_at
, and useArc
to share it between writer and background task.IoBufferMut::freeze
to get a cheaply clonableIoBuffer
, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invokeIoBuffer::into_mut
, which gives us back the mutable aligned buffer.Caveat
download_object
does one more memcpy than before this PR due to the switch to use_borrowed
version of the write.Testing
EphemeralFile
when the last submitted buffer is not flushed, in-progress, and done flushing to disk.Performance
We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency.
Results
Checklist before requesting a review
Checklist before merging