expose waiters queue depth metrics #56

yliang412 · 2024-10-16T17:02:55Z

Problem

We want to make slots waiters queue depth observable, so it is easier to tune the submission queue size (also equal to the number of inflight operation as implemented in the system).

Summary of changes

Introduced PerSystemMetrics trait that record slots submission queue depth. In the future, per-system metrics can be added through extending this trait.
tokio_epoll_uring::System now takes in the per-system metrics observer Arc<M> where M : PerSystemMetrics

Alternative design

See neondatabase/neon#9482 for details.

tokio-epoll-uring/src/metrics.rs

VladLazar

Looks sensible.

Generally however, I'm not sure this is the right approach for per-system metrics.
To my mind, each system tracks it's own metrics separately. When you wish to recover the metrics, you go through each thread local system, query it and return the aggregated result. This removes the need for locking.

I looked into this a little bit and the std lib doesn't provide nice functionality for this (maybe there's some tokio API you can use to iterate threads). The [thread_local](https://docs.rs/thread_local/latest/thread_local/) crate however lets you grab a nice iterator.

tokio-epoll-uring/src/metrics.rs

yliang412 · 2024-10-18T14:00:54Z

I looked into this a little bit and the std lib doesn't provide nice functionality for this (maybe there's some tokio API you can use to iterate threads). The thread_local crate however lets you grab a nice iterator.

If we could use this in pageserver, then I can make each system in tokio-epoll-uring tracks it's own metrics separately, and use the iterator to collect results when collecting metrics.

VladLazar · 2024-10-18T15:46:02Z

I looked into this a little bit and the std lib doesn't provide nice functionality for this (maybe there's some tokio API you can use to iterate threads). The thread_local crate however lets you grab a nice iterator.

If we could use this in pageserver, then I can make each system in tokio-epoll-uring tracks it's own metrics separately, and use the iterator to collect results when collecting metrics.

Right. Might be worth investigating, but don't wish to side-track you too much. I think what you have here is fine in the short term. I'd add a comment to the metrics saying that: "if you wanna add new per system metrics, do it properly and link to an issue".

yliang412 · 2024-10-22T15:13:50Z

In 132aea5, I wrote comments on adding new per-system metrics, and linked to neondatabase/neon#9480.

problame

Hm, independent of the thread_local discussion, the way we add to the metrics here is that we only add observations if we don't get a slot.
We'll never know how close we are to 100% utilization until we hit 100%.
We want to know before.

How to solve this?
The RING_SIZE - unused_indices.len() should give us the number of in-use slots.

If we get a slot right away, observe that number.
If we don't get a slot and have to enqueue to wait, observe above number PLUS waiters.len().

As to the kind of metric: you put in the equivalent of a gauge here.
Gauges aren't useful at our sampling frequency.
The way I have seen queue depth metrics done in other places is by using histograms over observed queue depth.
E.g., when we get a slot, we get a number as per the rules in the previous paragrah.
We record this into the histogram by finding the bucket for that number.
E.g. we get 12 waiters and we have buckets [0,10)[10,20)[20,30)...
Then we increment bucket [10,20).
(Prometheus histograms are le histograms, so, it would also increment all later buckets, but that's an implementation detail).

As per thread_local, I'm all for using it so we don't have that RwLock. That RwLock is a scalability bottleneck because RwLock::read requires exclusive access to a cache line for recording the new reader. What I'd give for read-mostly locks in useland Rust :(

Signed-off-by: Yuchen Liang <[email protected]>

…9482) In complement to neondatabase/tokio-epoll-uring#56. ## Problem We want to make tokio-epoll-uring slots waiters queue depth observable via Prometheus. ## Summary of changes - Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth` metrics as a `Histogram`. - Each thread-local tokio-epoll-uring system is given a `LocalHistogram` to observe the metrics. - Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data to the shared histogram. - Extend `Collector::collect` to report `pageserver_tokio_epoll_uring_slots_submission_queue_depth`. Signed-off-by: Yuchen Liang <[email protected]> Co-authored-by: Christian Schwarz <[email protected]>

yliang412 commented Oct 16, 2024

View reviewed changes

tokio-epoll-uring/src/metrics.rs Outdated Show resolved Hide resolved

yliang412 force-pushed the yuchen/expose-waiters-queue-depth-metrics branch from d5fccc5 to ff31e19 Compare October 16, 2024 17:14

yliang412 requested review from problame and VladLazar October 16, 2024 17:16

yliang412 marked this pull request as ready for review October 17, 2024 13:23

yliang412 self-assigned this Oct 17, 2024

VladLazar reviewed Oct 18, 2024

View reviewed changes

tokio-epoll-uring/src/metrics.rs Outdated Show resolved Hide resolved

tokio-epoll-uring/src/metrics.rs Outdated Show resolved Hide resolved

tokio-epoll-uring/src/metrics.rs Outdated Show resolved Hide resolved

This was referenced Oct 21, 2024

pageserver: direct I/O neondatabase/neon#8130

Open

tokio-epoll-uring: use thread_local crate for better per-system metrics aggregation neondatabase/neon#9480

Closed

yliang412 requested a review from VladLazar October 22, 2024 15:17

yliang412 mentioned this pull request Oct 22, 2024

pageserver: add tokio-epoll-uring slots waiters queue depth metrics neondatabase/neon#9482

Merged

5 tasks

problame reviewed Oct 22, 2024

View reviewed changes

yliang412 added 3 commits October 23, 2024 21:18

rename old metrics to global metrics

d4332da

Signed-off-by: Yuchen Liang <[email protected]>

make Slots::try_get_slots also return queue depth

dd436aa

Signed-off-by: Yuchen Liang <[email protected]>

pass Arc<MetricsType> into tokio_epoll_uring::System::new

54a668c

Signed-off-by: Yuchen Liang <[email protected]>

yliang412 force-pushed the yuchen/expose-waiters-queue-depth-metrics branch from 132aea5 to 54a668c Compare October 24, 2024 02:30

add trait comments

a6f3875

Signed-off-by: Yuchen Liang <[email protected]>

yliang412 requested a review from problame October 25, 2024 15:19

problame added 2 commits October 25, 2024 20:18

use default generic type for System<>

5dfdb68

avoid polluting launch() signature, less churn

dbec1bc

problame approved these changes Oct 25, 2024

View reviewed changes

yliang412 merged commit cb2dcea into main Oct 25, 2024
4 of 6 checks passed

yliang412 deleted the yuchen/expose-waiters-queue-depth-metrics branch October 25, 2024 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose waiters queue depth metrics #56

expose waiters queue depth metrics #56

yliang412 commented Oct 16, 2024 •

edited

Loading

VladLazar left a comment

yliang412 commented Oct 18, 2024

VladLazar commented Oct 18, 2024

yliang412 commented Oct 22, 2024

problame left a comment

expose waiters queue depth metrics #56

expose waiters queue depth metrics #56

Conversation

yliang412 commented Oct 16, 2024 • edited Loading

Problem

Summary of changes

Alternative design

VladLazar left a comment

Choose a reason for hiding this comment

yliang412 commented Oct 18, 2024

VladLazar commented Oct 18, 2024

yliang412 commented Oct 22, 2024

problame left a comment

Choose a reason for hiding this comment

yliang412 commented Oct 16, 2024 •

edited

Loading