Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use tokio-epoll-uring for directory namespace operations #7370

Open
5 tasks
problame opened this issue Apr 12, 2024 · 0 comments
Open
5 tasks

use tokio-epoll-uring for directory namespace operations #7370

problame opened this issue Apr 12, 2024 · 0 comments
Labels
a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver

Comments

@problame
Copy link
Contributor

problame commented Apr 12, 2024

Tasks

Preview Give feedback
@problame problame self-assigned this Apr 12, 2024
@problame problame added a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver labels Apr 12, 2024
@problame problame removed their assignment Apr 30, 2024
problame added a commit that referenced this issue Aug 29, 2024
…8862)

# Motivation

In #8832 I get tokio runtime
worker stack overflow errors in debug builds.

In a similar vein, I had tokio runtimer worker stack overflow when
trying to eliminate `async_trait`
(#8296).

The 2MiB default is kind of arbitrary - so this PR bumps it to 4MiB.

It also adds an env var to control it.

# Risk Assessment

With our 4 runtimes, the worst case stack memory usage is `4 (runtimes)
* ($num_cpus (executor threads) + 512 (blocking pool threads)) * 4MiB`.

On i3en.3xlarge, that's `8384 MiB`. 
On im4gn.2xlarge, that's `8320 MiB`.
Before this change, it was half that.

Looking at production metrics, we _do_ have the headroom to accomodate
this worst case case.

# Alternatives

The problems only occur with debug builds, so technically we could only
raise the stack size for debug builds.

However, it would be another configuration where `debug != release`.

# Future Work

If we ever enable single runtime mode in prod (=>
#7312 ) then the worst case
will drop to 25% of its current value.

Eliminating the use of `tokio::spawn_blocking` / `tokio::fs` in favor of
`tokio-epoll-uring` (=> #7370
) would reduce the worst case to `4 (runtimes) * $num_cpus (executor
threads) * 4 MiB`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

1 participant