Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bypass PageCache for InMemoryLayer + avoid Value::deser on L0 flush #8537

Merged
merged 52 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
332ca2b
WIP
problame Jul 29, 2024
37bfa04
implement coalescing of multiple reads onto same page
problame Aug 14, 2024
6f65b4d
don't think in pages, but DIO chunks; remove read_page & page_caching…
problame Aug 14, 2024
fb78185
merging of adjacent chunk reads, up to max batch size
problame Aug 15, 2024
72966e0
bench_ingest results (on my MBP linux VM)
problame Aug 15, 2024
3112827
manually benchmark get_values_reconstruct_data performance
problame Aug 15, 2024
10c7419
https://github.com/neondatabase/neon/pull/8537#discussion_r1719830114
problame Aug 19, 2024
0d22d4d
https://github.com/neondatabase/neon/pull/8537#discussion_r1719877858
problame Aug 19, 2024
6efd9fb
WIP: refactor for unit testing
problame Aug 19, 2024
ef2b384
try a closure approach, also has same error "Send is not general enough"
problame Aug 20, 2024
521da32
Revert "try a closure approach, also has same error "Send is not gene…
problame Aug 20, 2024
e68555e
fix the compile errors & progress with testability
problame Aug 20, 2024
3970729
add MockFile infrastructure
problame Aug 20, 2024
c047d01
add blackbox test
problame Aug 20, 2024
c6f4fce
move MockFile section after blackbox test
problame Aug 20, 2024
43e2d1d
add two recorder tests (more to come)
problame Aug 20, 2024
9f27e36
move MockFile to bottom
problame Aug 20, 2024
34d3d4e
add basic test for valueread reuse
problame Aug 20, 2024
1feca8d
add test testing partial error behavior
problame Aug 20, 2024
5c448ce
permutation testing
problame Aug 21, 2024
a38714a
tests for short chunk reads + refactor for ValueReadState
problame Aug 21, 2024
f245fa4
coverage for peeking_take_while deciding to skip
problame Aug 21, 2024
13ec4a0
the great rename
problame Aug 21, 2024
2159ae8
commentary in inmemory_layer.rs
problame Aug 21, 2024
9d9b5f2
fix release build
problame Aug 21, 2024
f686b91
clippy
problame Aug 21, 2024
4c115bc
add basic test for EphemeralFile
problame Aug 21, 2024
e4af31c
packed index value
problame Aug 21, 2024
d5a17b0
Merge remote-tracking branch 'origin/main' into problame/inmemory-lay…
problame Aug 21, 2024
0d6ae6e
fix doc comment
problame Aug 21, 2024
7dc73d3
fix layer rolling test
problame Aug 22, 2024
78d1ba4
Merge from main + revise everything to use u64==usize
problame Aug 22, 2024
20d97f8
clippy
problame Aug 22, 2024
06cb7b9
more clippy (why doesn't my local clippy find this?)
problame Aug 22, 2024
dde46aa
https://github.com/neondatabase/neon/pull/8537#discussion_r1726852836
problame Aug 22, 2024
78f81cf
https://github.com/neondatabase/neon/pull/8537#discussion_r1726853227
problame Aug 22, 2024
09e67ec
https://github.com/neondatabase/neon/pull/8537#discussion_r1726850864
problame Aug 22, 2024
c445da4
add more EphemeralFile tests, covering read_at_to_end behavior; https…
problame Aug 22, 2024
edb0ebc
remove seal; https://github.com/neondatabase/neon/pull/8537#discussio…
problame Aug 22, 2024
2d06683
improve File trait naming & docs
problame Aug 22, 2024
df4571f
more renaming to read_exact_at_eof_ok & reuse File::read_exact_at_eof…
problame Aug 22, 2024
3283785
doc fix
problame Aug 22, 2024
ef1c55c
rename MergedInterest to PhysicalInterest
problame Aug 22, 2024
175a430
https://github.com/neondatabase/neon/pull/8537#discussion_r1728692145
problame Aug 26, 2024
af0ab1d
adapt SerializedBatch buffer size; we no longer store length in Ephem…
problame Aug 26, 2024
d6827cc
https://github.com/neondatabase/neon/pull/8537#discussion_r1728609900
problame Aug 26, 2024
e8ecff6
better name for validation + check on startup; https://github.com/neo…
problame Aug 26, 2024
5584426
improve doc comments pertaining max pos & rearrange functions for bet…
problame Aug 27, 2024
fcb39d0
RE-REVIEW: add the base_offset in-place to avoid allocation; https://…
problame Aug 27, 2024
f09e6af
fixup
problame Aug 27, 2024
35b81a0
rename InMemoryLayerIndexValue to IndexEntry; https://github.com/neon…
problame Aug 27, 2024
b184f77
more fixup fcb39d052477fd5f1431d1758a3e753a93d4c3ce
problame Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ axum = { version = "0.6.20", features = ["ws"] }
base64 = "0.13.0"
bincode = "1.3"
bindgen = "0.65"
bit_field = "0.10.2"
bstr = "1.0"
byteorder = "1.4"
bytes = "1.0"
Expand Down Expand Up @@ -145,6 +146,7 @@ rustls-split = "0.3"
scopeguard = "1.1"
sysinfo = "0.29.2"
sd-notify = "0.4.1"
send-future = "0.1.0"
sentry = { version = "0.32", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
Expand Down
2 changes: 2 additions & 0 deletions pageserver/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ arc-swap.workspace = true
async-compression.workspace = true
async-stream.workspace = true
async-trait.workspace = true
bit_field.workspace = true
byteorder.workspace = true
bytes.workspace = true
camino.workspace = true
Expand Down Expand Up @@ -52,6 +53,7 @@ rand.workspace = true
range-set-blaze = { version = "0.1.16", features = ["alloc"] }
regex.workspace = true
scopeguard.workspace = true
send-future.workspace = true
serde.workspace = true
serde_json = { workspace = true, features = ["raw_value"] }
serde_path_to_error.workspace = true
Expand Down
4 changes: 2 additions & 2 deletions pageserver/benches/bench_ingest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,13 @@ async fn ingest(
batch.push((key.to_compact(), lsn, data_ser_size, data.clone()));
if batch.len() >= BATCH_SIZE {
let this_batch = std::mem::take(&mut batch);
let serialized = SerializedBatch::from_values(this_batch);
let serialized = SerializedBatch::from_values(this_batch).unwrap();
layer.put_batch(serialized, &ctx).await?;
}
}
if !batch.is_empty() {
let this_batch = std::mem::take(&mut batch);
let serialized = SerializedBatch::from_values(this_batch);
let serialized = SerializedBatch::from_values(this_batch).unwrap();
layer.put_batch(serialized, &ctx).await?;
}
layer.freeze(lsn + 1).await;
Expand Down
39 changes: 39 additions & 0 deletions pageserver/src/assert_u64_eq_usize.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
//! `u64`` and `usize`` aren't guaranteed to be identical in Rust, but life is much simpler if that's the case.

pub(crate) const _ASSERT_U64_EQ_USIZE: () = {
if std::mem::size_of::<usize>() != std::mem::size_of::<u64>() {
panic!("the traits defined in this module assume that usize and u64 can be converted to each other without loss of information");
}
};

pub(crate) trait U64IsUsize {
fn into_usize(self) -> usize;
}

impl U64IsUsize for u64 {
#[inline(always)]
fn into_usize(self) -> usize {
#[allow(clippy::let_unit_value)]
let _ = _ASSERT_U64_EQ_USIZE;
self as usize
}
}

pub(crate) trait UsizeIsU64 {
fn into_u64(self) -> u64;
}

impl UsizeIsU64 for usize {
#[inline(always)]
fn into_u64(self) -> u64 {
#[allow(clippy::let_unit_value)]
let _ = _ASSERT_U64_EQ_USIZE;
self as u64
}
}

pub const fn u64_to_usize(x: u64) -> usize {
#[allow(clippy::let_unit_value)]
let _ = _ASSERT_U64_EQ_USIZE;
x as usize
}
10 changes: 10 additions & 0 deletions pageserver/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ use utils::{

use crate::l0_flush::L0FlushConfig;
use crate::tenant::config::TenantConfOpt;
use crate::tenant::storage_layer::inmemory_layer::IndexEntry;
use crate::tenant::timeline::compaction::CompactL0Phase1ValueAccess;
use crate::tenant::vectored_blob_io::MaxVectoredReadBytes;
use crate::tenant::{TENANTS_SEGMENT_NAME, TIMELINES_SEGMENT_NAME};
Expand Down Expand Up @@ -1005,6 +1006,15 @@ impl PageServerConf {

conf.default_tenant_conf = t_conf.merge(TenantConf::default());

IndexEntry::validate_checkpoint_distance(conf.default_tenant_conf.checkpoint_distance)
.map_err(|msg| anyhow::anyhow!("{msg}"))
.with_context(|| {
format!(
"effective checkpoint distance is unsupported: {}",
conf.default_tenant_conf.checkpoint_distance
)
})?;

Ok(conf)
}

Expand Down
1 change: 1 addition & 0 deletions pageserver/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ pub mod l0_flush;
use futures::{stream::FuturesUnordered, StreamExt};
pub use pageserver_api::keyspace;
use tokio_util::sync::CancellationToken;
mod assert_u64_eq_usize;
pub mod aux_file;
pub mod metrics;
pub mod page_cache;
Expand Down
6 changes: 6 additions & 0 deletions pageserver/src/tenant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,12 @@ impl Tenant {
});
};

// TODO: should also be rejecting tenant conf changes that violate this check.
if let Err(e) = crate::tenant::storage_layer::inmemory_layer::IndexEntry::validate_checkpoint_distance(tenant_clone.get_checkpoint_distance()) {
make_broken(&tenant_clone, anyhow::anyhow!(e), BrokenVerbosity::Error);
return Ok(());
}

let mut init_order = init_order;
// take the completion because initial tenant loading will complete when all of
// these tasks complete.
Expand Down
4 changes: 2 additions & 2 deletions pageserver/src/tenant/blob_io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ pub(super) const LEN_COMPRESSION_BIT_MASK: u8 = 0xf0;

/// The maximum size of blobs we support. The highest few bits
/// are reserved for compression and other further uses.
const MAX_SUPPORTED_LEN: usize = 0x0fff_ffff;
pub(crate) const MAX_SUPPORTED_BLOB_LEN: usize = 0x0fff_ffff;

pub(super) const BYTE_UNCOMPRESSED: u8 = 0x80;
pub(super) const BYTE_ZSTD: u8 = BYTE_UNCOMPRESSED | 0x10;
Expand Down Expand Up @@ -326,7 +326,7 @@ impl<const BUFFERED: bool> BlobWriter<BUFFERED> {
(self.write_all(io_buf.slice_len(), ctx).await, srcbuf)
} else {
// Write a 4-byte length header
if len > MAX_SUPPORTED_LEN {
if len > MAX_SUPPORTED_BLOB_LEN {
return (
(
io_buf.slice_len(),
Expand Down
23 changes: 0 additions & 23 deletions pageserver/src/tenant/block_io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
//! Low-level Block-oriented I/O functions
//!

use super::ephemeral_file::EphemeralFile;
use super::storage_layer::delta_layer::{Adapter, DeltaLayerInner};
use crate::context::RequestContext;
use crate::page_cache::{self, FileId, PageReadGuard, PageWriteGuard, ReadBufResult, PAGE_SZ};
Expand Down Expand Up @@ -81,9 +80,7 @@ impl<'a> Deref for BlockLease<'a> {
/// Unlike traits, we also support the read function to be async though.
pub(crate) enum BlockReaderRef<'a> {
FileBlockReader(&'a FileBlockReader<'a>),
EphemeralFile(&'a EphemeralFile),
Adapter(Adapter<&'a DeltaLayerInner>),
Slice(&'a [u8]),
#[cfg(test)]
TestDisk(&'a super::disk_btree::tests::TestDisk),
#[cfg(test)]
Expand All @@ -100,9 +97,7 @@ impl<'a> BlockReaderRef<'a> {
use BlockReaderRef::*;
match self {
FileBlockReader(r) => r.read_blk(blknum, ctx).await,
EphemeralFile(r) => r.read_blk(blknum, ctx).await,
Adapter(r) => r.read_blk(blknum, ctx).await,
Slice(s) => Self::read_blk_slice(s, blknum),
#[cfg(test)]
TestDisk(r) => r.read_blk(blknum),
#[cfg(test)]
Expand All @@ -111,24 +106,6 @@ impl<'a> BlockReaderRef<'a> {
}
}

impl<'a> BlockReaderRef<'a> {
fn read_blk_slice(slice: &[u8], blknum: u32) -> std::io::Result<BlockLease> {
let start = (blknum as usize).checked_mul(PAGE_SZ).unwrap();
let end = start.checked_add(PAGE_SZ).unwrap();
if end > slice.len() {
return Err(std::io::Error::new(
std::io::ErrorKind::UnexpectedEof,
format!("slice too short, len={} end={}", slice.len(), end),
));
}
let slice = &slice[start..end];
let page_sized: &[u8; PAGE_SZ] = slice
.try_into()
.expect("we add PAGE_SZ to start, so the slice must have PAGE_SZ");
Ok(BlockLease::Slice(page_sized))
}
}

///
/// A "cursor" for efficiently reading multiple pages from a BlockReader
///
Expand Down
Loading
Loading