Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn BlockCursor::{read_blob,read_blob_into_buf} async fn #4905

Merged
merged 11 commits into from
Aug 14, 2023

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Aug 4, 2023

Problem

The BlockCursor::read_blob and BlockCursor::read_blob_into_buf functions are calling read_blk internally, so if we want to make that function async fn, they need to be async themselves.

Summary of changes

  • We first turn ValueRef::load into an async fn.
  • Then, we switch the RwLock implementation in InMemoryLayer to use the one from tokio.
  • Last, we convert the read_blob and read_blob_into_buf functions into async fn.

In three instances we use Handle::block_on:

  • one use is in compaction code, which currently isn't async. We put the entire loop into an async block to prevent the potentially hot loop from doing cross-thread operations.
  • one use is in dumping code for DeltaLayer. The "proper" way to address this would be to enable the visit function to take async closures, but then we'd need to be generic over async fs non async, which isn't supported by rust right now. The other alternative would be to do a first pass where we cache the data into memory, and only then to dump it.
  • the third use is in writing code, inside a loop that copies from one file to another. It is is synchronous and we'd like to keep it that way (for now?).

Part of #4743

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@arpad-m arpad-m marked this pull request as ready for review August 4, 2023 14:51
@arpad-m arpad-m requested review from a team as code owners August 4, 2023 14:51
@arpad-m arpad-m requested review from conradludgate and hlinnaka and removed request for a team August 4, 2023 14:51
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

1588 tests run: 1512 passed, 0 failed, 76 skipped (full report)


The comment gets automatically updated with the latest test results
564258f at 2023-08-14T14:30:12.923Z :recycle:

@hlinnaka
Copy link
Contributor

hlinnaka commented Aug 5, 2023

Then, we switch the RwLock implementation in InMemoryLayer to use the one from parking_lot, and enable the sendable guards feature of parking_lot, as we'll be holding guards from that lock over an await boundary.

tokio::sync:RwLock would be more appropriate.

@hlinnaka
Copy link
Contributor

hlinnaka commented Aug 5, 2023

one use is in compaction code, which currently isn't async. It is per-value, so it might be hot, depending on how much overhead Handle::block_on causes, we might not want this.

Yeah that's a bit scary from a performance point of view.

Opinions welcome, one could also put the entire loop into an async block.

+1 that sounds better. Or even better, mark the whole compact_level0_phase1 async and remove the spawn_blocking from its caller.

@hlinnaka
Copy link
Contributor

hlinnaka commented Aug 5, 2023

Then, we switch the RwLock implementation in InMemoryLayer to use the one from parking_lot, and enable the sendable guards feature of parking_lot, as we'll be holding guards from that lock over an await boundary.

tokio::sync:RwLock would be more appropriate.

At a quick glance, the functions that acquire the lock fall into two categories:

  1. async functions, or functions that could be changed into async easily
  2. functions that only grab the lock to read end_lsn

Async functions can easily just call inner.read().await, For the second category, one idea is to move end_lsn out of the locked struct, into InMemoryLayer, as an atomic field. Then you could read it without the lock.

@arpad-m
Copy link
Member Author

arpad-m commented Aug 7, 2023

tokio::sync:RwLock would be more appropriate.

Originally I tried using it, but it makes each acquiring of a lock async, which got me into trouble in the writing code, as it is still heavily synchronous.

Or even better, mark the whole compact_level0_phase1 async and remove the spawn_blocking from its caller.

I'm not sure it's a good idea to do it before the actual underlying I/O operations are async: the task will otherwise block an entire task for the async scheduler. Even if the workload is CPU-bound, you should still yield back regularly to the executor via hitting an.await point.

I suppose I'll then just move the entire loop into an async block. It's not optimal but better than having all of compaction in an async block.

@arpad-m arpad-m changed the title Make BlockCursor::{read_blob,read_blob_into_buf} async fn Turn BlockCursor::{read_blob,read_blob_into_buf} async fn Aug 10, 2023
@arpad-m
Copy link
Member Author

arpad-m commented Aug 10, 2023

Okay, in response to feedback I have:

  • Moved the end_lsn out of the lock (InMemoryLayer: move end_lsn out of the lock #4963) and switched from parking_lot's synchronous lock to tokio's asynchronous RwLock
  • Moved the Handle::block_on in compaction from the for loop to outside of the for loop, inside an async block

Also, I've merged latest main into this PR.

arpad-m added a commit that referenced this pull request Aug 11, 2023
## Problem

In some places, the lock on `InMemoryLayerInner` is only created to
obtain `end_lsn`. This is not needed however, if we move `end_lsn` to
`InMemoryLayer` instead.

## Summary of changes

Make `end_lsn` a member of `InMemoryLayer`, and do less locking of
`InMemoryLayerInner`. `end_lsn` is changed from `Option<Lsn>` into an
`OnceLock<Lsn>`. Thanks to this change, we don't need to lock any more
in three functions.

Part of #4743 . Suggested in
#4905 (comment) .
@arpad-m arpad-m force-pushed the arpad/pageserver_io_async_read_blob branch from 791947e to 99a755e Compare August 11, 2023 16:23
@arpad-m arpad-m requested a review from hlinnaka August 12, 2023 01:08
pageserver/src/tenant/timeline.rs Show resolved Hide resolved
@arpad-m arpad-m force-pushed the arpad/pageserver_io_async_read_blob branch from aff279e to 564258f Compare August 14, 2023 14:06
@arpad-m arpad-m merged commit ce7efbe into main Aug 14, 2023
@arpad-m arpad-m deleted the arpad/pageserver_io_async_read_blob branch August 14, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants