From 952b01098ad2a870a1a90dbb48c8794792d73cbd Mon Sep 17 00:00:00 2001 From: Daniel <10074684+danieltprice@users.noreply.github.com> Date: Tue, 12 Dec 2023 10:48:55 -0400 Subject: [PATCH] Minor readme and doc edits (#30) --- README.md | 7 +++---- tokio-epoll-uring/src/doc/benchmarks.md | 8 ++++---- tokio-epoll-uring/src/doc/design.md | 6 ++---- tokio-epoll-uring/src/doc/motivation.md | 6 +++--- 4 files changed, 12 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index cb47453..29d5344 100644 --- a/README.md +++ b/README.md @@ -9,20 +9,19 @@ Use `cargo doc --no-deps --open`. -The Rust docs include sections on the motivation, design of this crate, and benchmarks. +The Rust docs include sections on the motivation behind this project, the design of the crate, and benchmarks. If you prefer to read them in Markdown in your browser, check out the files in [`tokio-epoll-uring/src/doc`](tokio-epoll-uring/src/doc). # Examples Check out [`./tokio-epoll-uring/examples`](./tokio-epoll-uring/examples). - # Bug Reports, Roadmap, Contributing -Keep in mind the warning at the top of this file: this project is not intended for use outside of Neon (yet). +As noted at the top of this readme file, this project is not yet intended for use outside of Neon. * Genuine bug reports are welcome. -* Feature requests are not welcome. +* We will not be able to accept feature requests at this time. * We are unlikely to accept big refactoring PRs. * We are likely to accept PRs that add support for new io_uring opcodes. Duplicating some code per opcdoe added is preferred over doing a big DRY refactoring. diff --git a/tokio-epoll-uring/src/doc/benchmarks.md b/tokio-epoll-uring/src/doc/benchmarks.md index 6f117bd..e243a68 100644 --- a/tokio-epoll-uring/src/doc/benchmarks.md +++ b/tokio-epoll-uring/src/doc/benchmarks.md @@ -9,14 +9,14 @@ Highlights: * With 400 tasks, each operating on a 100MiB file, the workload fits completely into the page cache. We achieve 1.8 million 8k random read IOPS with a high degree of fairness. When willing to sacrifice fairness, a different configuration achieves 2.45 million 8k random read IOPS. - Even the "1.8mio" are 5x of what `tokio::fs`/`tokio::spawn_blocking` achieve. + Even the "1.8mio" are 5x what `tokio::fs`/`tokio::spawn_blocking` achieves. It is 4x what a single `tokio_uring` runtime can achieve, because our design takes advantage of multiple cores through multiple tokio worker threads, whereas `tokio_uring` is limited to a executor thread / CPU core. * With 1200 tasks (100MiB file per task), the workload is about twice the page cache size. - Hence we're maxing out the IOPS supported by the `i4i.2xlarge`'s Instance Store NVMe (~130k IOPS). + Hence, we're maxing out the IOPS supported by the `i4i.2xlarge`'s Instance Store NVMe (~130k IOPS). Our crate has performance equivalent to `tokio::fs`/`tokio::spawn_blocking` and outperforms `tokio_uring` by about 2x. - Again, `tokio_uring` is bottlenecked by its limitation to a single executor thread / CPU core. + Again, the single executor thread / CPU core limitation is a bottleneck for `tokio_uring`. Fairness is measured as follows: each tokio task in the benchmark is given a fixed amount of 8k random reads to perform. We measure per task the time from benchmark start (same for all tasks) to the time the task finished. @@ -24,7 +24,7 @@ Assuming a fair kernel page cache, a fair system will result in all tasks finish An unfair system will result in some tasks finishing earlier than others. By sorting & plotting task runtimes on a scatter plot (x axis: index of sorted result, y axis: task runtime) we get a visualization of task fairness: a flat line is maximum fairness, a steep line means some tasks were heavily favored over others. -We can also compare `min` and `max` task runtime for a given configuration and calculate the spread factor `max/min`. +We can also compare `min` and `max` task runtimes for a given configuration and calculate the spread factor `max/min`. A lower spread factor means higher fairness. Detailed results: diff --git a/tokio-epoll-uring/src/doc/design.md b/tokio-epoll-uring/src/doc/design.md index 7c6ad8a..68d0441 100644 --- a/tokio-epoll-uring/src/doc/design.md +++ b/tokio-epoll-uring/src/doc/design.md @@ -4,7 +4,7 @@ The core insights behind this crate are: * We can use Linux's new-ish `io_uring` facility to submit kernel-buffered reads from async Rust. * To wait for completions, `io_uring` supports `epoll`ing of the file descriptor that represents an `io_uring` instance (This is a lesser-known but supported feature of io_uring). -* Vanilla `tokio` supports `epoll`ing arbitrary file descriptors in async Rust via [`tokio::io::unix::AsyncFd`]. +* Vanilla `tokio` supports `epoll`ing arbitrary file descriptors in async Rust via [`tokio::io::unix::AsyncFd`]. Combine the above, and the result is this library: @@ -29,8 +29,6 @@ the lazy initialization of the thread-local. ## Critique -We shouldn't put too much lipstick on the pig: - The tokio runtime has no idea of the high priority that the *poller task* has in the overall system. If the system is under heavy load, the turnaround time for the *poller task* in the tokio scheduler will grow. This means completion processing gets delayed, thereby transitively delaying `wake()` of the futures that issued these operations. @@ -39,7 +37,7 @@ It would be beneficial to prioritize the *poller task* if the io_uring fd become Further, in the one-io_uring-per-executor-thread extension to the design: 1. There is no way to pin the *poller task* to the OS thread as the io_uring it corresponds to. This means the *poller task*s float freely among the executor OS threads. - In a scenario where a read hits the page cache, the *poller task* will thus run on a different OS thread + In a scenario where a read hits the page cache, the *poller task* will run on a different OS thread than where the read op was issued. So, tokio will have to do an (intra-runtime) cross-OS-thread `wake()`. Which is more expensive than if the *poller task* were on the same OS thread as the task that issued the read op. Sadly, there is [little hope](https://discord.com/channels/500028886025895936/500336333500448798/1131667951657955481) diff --git a/tokio-epoll-uring/src/doc/motivation.md b/tokio-epoll-uring/src/doc/motivation.md index d24fffa..e15ab3e 100644 --- a/tokio-epoll-uring/src/doc/motivation.md +++ b/tokio-epoll-uring/src/doc/motivation.md @@ -18,20 +18,20 @@ For example, a page-cache-missing 8k random `read()` an `i4i.2xlarge` EC2 instan In this case, `tokio::fs` is a better choice than blocking the executor thread through sync IO. But, a page-cache-*hit* 8k random `read()` on the same instance takes `~1us`. -In this case, `tokio::fs` is clearly a bad choice than blocking the executor thread through sync IO. +In this case, `tokio::fs` is clearly a worse choice than blocking the executor thread through sync IO. The problem is that we don't know ahead of time if it will be a hit or miss. So, we must use the same code path for both cases. And, we can't use `tokio::fs` for both cases because it's a bad choice for the "hit" case. But we can't use synchronous IO either because it would block the executor thread for too long if it's a `miss`. -Now, one could argue to just switch to [`tokio_uring`](https://docs.rs/tokio-uring/latest/tokio_uring/). +Now, one could argue for just switching to [`tokio_uring`](https://docs.rs/tokio-uring/latest/tokio_uring/). It's a new async Rust runtime that uses [`io_uring`](https://manpages.debian.org/unstable/liburing-dev/io_uring_enter.2.en.html) instead of [`epoll`](https://manpages.debian.org/unstable/manpages/epoll.7.en.html) under the hood. Pretty cool, if only ... * ... it was as widely used as `tokio`, so one could be sure that it's production-ready and * ... [Rust allowed us to be generic over async runtimes](https://github.com/rust-lang/wg-async/issues/45), - so all our code that depends on vanilla `tokio` won't be broken at runtime or compile time. + so all of our code that depends on vanilla `tokio` won't be broken at runtime or compile time. So, the next best idea is to use `tokio` and `tokio_uring` within the same process like so: - Use `tokio` by default.