Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate tokio-epoll-uring as alternative VirtualFile IO engine #5824

Merged
merged 9 commits into from
Jan 26, 2024

Conversation

problame
Copy link
Contributor

@problame problame commented Nov 8, 2023

(part of #4744 )

This PR integrates tokio-epoll-uring as an alternative IO engine for Pageserver's VirtualFile open and read operations.

The IO engine is configurable via pageserver config, and for Rust unit tests, via an env var.

The goal of integrating early is to uncover unknown problems with tokio-epoll-uring early
by exposing the code to staging & CI tests for a few days. Ideally we'd run all the CI tests for both IO engines, but, that would require twice the CI resources.

On older kernel versions, io_uring SQ and CQ is accounted as locked memory.
This PR raises the memlock ulimit wherever we execute our Rust code in CI.
See #6373 for details on memlock.
The PR that will raise memlock ulimit for staging and prod is neondatabase/aws#932;

Copy link

github-actions bot commented Nov 8, 2023

2367 tests run: 2270 passed, 0 failed, 97 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_lfc_resize[tokio-epoll-uring]: debug

Postgres 15

  • test_crafted_wal_end[tokio-epoll-uring-last_wal_record_crossing_segment]: release

Code coverage (full report)

  • functions: 54.6% (10830 of 19822 functions)
  • lines: 81.5% (61338 of 75301 lines)

The comment gets automatically updated with the latest test results
7c92164 at 2024-01-25T14:43:45.850Z :recycle:

@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 6b5992d to 0003744 Compare November 8, 2023 14:55
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from c1b1da2 to 6c359a4 Compare November 20, 2023 11:36
problame added a commit that referenced this pull request Nov 29, 2023
Squashed commit of the following:

commit 5ec61ce
Author: Christian Schwarz <[email protected]>
Date:   Wed Nov 29 16:17:12 2023 +0000

    bump

commit 34c33d1
Author: Christian Schwarz <[email protected]>
Date:   Mon Nov 20 14:38:29 2023 +0000

    bump

commit 8fa6b76
Author: Christian Schwarz <[email protected]>
Date:   Mon Nov 20 11:47:19 2023 +0000

    bump

commit 6c359a4
Author: Christian Schwarz <[email protected]>
Date:   Mon Nov 20 11:33:58 2023 +0000

    use neondatabase/tokio-epoll-uring#25

commit 7d484b0
Author: Christian Schwarz <[email protected]>
Date:   Tue Aug 29 19:13:38 2023 +0000

    use WIP tokio_epoll_uring open_at for async VirtualFile::open

    This makes Delta/Image ::load fns fully tokio-epoll-uring

commit 51b26b1
Author: Christian Schwarz <[email protected]>
Date:   Tue Aug 29 12:24:30 2023 +0000

    use `tokio_epoll_uring` for read path

commit a4e6f0c
Author: Christian Schwarz <[email protected]>
Date:   Wed Nov 8 12:36:34 2023 +0000

    Revert "revert recent VirtualFile asyncification changes (#5291)"

    This reverts commit ab1f37e.

    fixes #5479
problame added a commit that referenced this pull request Nov 30, 2023
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 5ec61ce to f51a08e Compare December 1, 2023 13:47
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from f51a08e to 0ca94f5 Compare December 11, 2023 16:26
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 256b417 to b388180 Compare January 12, 2024 11:20
@problame problame changed the title WIP: integrate tokio-epoll-uring integrate tokio-epoll-uring Jan 12, 2024
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from b388180 to ccd2de9 Compare January 12, 2024 11:30
@problame problame changed the title integrate tokio-epoll-uring integrate tokio-epoll-uring as alternative VirtualFile IO engine Jan 12, 2024
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from ccd2de9 to ccf8ffd Compare January 12, 2024 17:06
@bayandin bayandin self-requested a review January 12, 2024 18:13
@bayandin bayandin force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 0d1bd27 to f90c159 Compare January 15, 2024 19:03
@bayandin bayandin added the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Jan 15, 2024
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 75ebc39 to e99a3a1 Compare January 17, 2024 12:41
@problame problame marked this pull request as ready for review January 17, 2024 12:46
@problame problame requested a review from a team as a code owner January 17, 2024 12:46
@problame problame requested review from arpad-m, jcsp and VladLazar and removed request for a team and arpad-m January 17, 2024 12:46
pageserver/src/tenant/block_io.rs Outdated Show resolved Hide resolved
pageserver/src/lib.rs Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this: only doubt is whether any of the VirtualFile changes might introduce subtle perf regressions, but that's what our shiny new benchmarks are for.

@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from aac2a83 to 423b4e1 Compare January 19, 2024 11:16
@problame
Copy link
Contributor Author

problame commented Jan 19, 2024

Edit: moved to #6373 (comment)

@problame
Copy link
Contributor Author

problame commented Jan 19, 2024

Edit: moved to #6373 (comment)

@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from 298b3e6 to cb6a8df Compare January 23, 2024 18:29
This is prep work for integrating support for runtime-configurable
io engines (=> tokio-epoll-uring).
Apart from sticking closer to the comment above the function,
this reduces the diff in the next patch.
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from cb6a8df to 2cf5a4c Compare January 24, 2024 16:37
problame added a commit that referenced this pull request Jan 24, 2024
…exercise it

Test `test_eof_before_buffer_full` fails, as discussed in
#5824 (comment)
problame and others added 6 commits January 25, 2024 12:12
Code is unused though, next commit hooks it up to the CI
- control via env var PAGESERVER_VIRTUAL_FILE_IO_ENGINE
- if an io engine other than std-fs is used, it shows up in the test
  name; this is so that we can continue to use the flaky tests database
- raise memlock limit & while at it also raise shmem limit
  for the Rust tests. It's need on our older runners that use
  an older 5.10.X LTS kernel, where io_uring SQ and CQ still
  counted towards the rlimit, see
  #6373 (comment)
  for details.

Co-authored-by: Alexander Bayandin <[email protected]>
Running both combinations bloats CI times too much.

So, we switch to tokio-epoll-uring for 2-3 days, so that the code gets exposure.
Then we switch it back a few days before the next release so that the
`std-fs` engine gets coverage, because that's what we're going to
continue using in prod for the time being.
@problame problame force-pushed the problame/integrate-tokio-epoll-uring/wip branch from a8c9912 to 7c92164 Compare January 25, 2024 13:19
@problame problame merged commit 918b03b into main Jan 26, 2024
51 checks passed
@problame problame deleted the problame/integrate-tokio-epoll-uring/wip branch January 26, 2024 08:25
problame added a commit that referenced this pull request Jan 26, 2024
PR #5824 introduced the concept of io engines in pageserver and
implement `tokio-epoll-uring` in addition to our current method,
`std-fs`.

We used `tokio-epoll-uring` in CI for a few days to get more exposure to
the code.  Now it's time to switch CI back so that we test with `std-fs`
as well, because that's what we're (still) using in production.
problame added a commit that referenced this pull request Jan 26, 2024
…#6492)

PR #5824 introduced the concept of io engines in pageserver and
implemented `tokio-epoll-uring` in addition to our current method,
`std-fs`.

We used `tokio-epoll-uring` in CI for a day to get more exposure to
the code.  Now it's time to switch CI back so that we test with `std-fs`
as well, because that's what we're (still) using in production.
problame added a commit that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants