safekeeper: decode and interpret for multiple shards in one go #10201

VladLazar · 2024-12-19T12:15:00Z

Problem

Currently, we call InterpretedWalRecord::from_bytes_filtered
from each shard. To serve multiple shards at the same time,
the API needs to allow for enquiring about multiple shards.

Summary of changes

This commit tweaks it a pretty brute force way. Naively, we could
just generate the shard for a key, but pre and post split shards
may be subscribed at the same time, so doing it efficiently is more
complex.

Currently, we call `InterpretedWalRecord::from_bytes_filtered` from each shard. To serve multiple shards at the same time, the API needs to allow for enquiring about multiple shards. This commit tweaks it a pretty brute force way. Naively, we could just generate the shard for a key, but pre and post split shards may be subscribed at the same time, so doing it efficiently is more complex.

github-actions · 2024-12-19T13:17:15Z

7095 tests run: 6797 passed, 0 failed, 298 skipped (full report)

Flaky tests (1)

Postgres 17

test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Code coverage* (full report)

functions: 31.3% (8397 of 26870 functions)
lines: 48.0% (66674 of 138996 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
cab7a14 at 2024-12-19T13:17:14.686Z :recycle:}

erikgrinaker

We're adding a fair number of heap allocations here. I'll leave it to you to decide whether we need to optimize these now or leave it for later (benchmarks would be good).

erikgrinaker · 2024-12-19T14:52:41Z

libs/wal_decoder/src/decoder.rs

    /// Shard 0 is a special case since it tracks all relation sizes. We only give it
    /// the keys that are being written as that is enough for updating relation sizes.
    pub fn from_bytes_filtered(
        buf: Bytes,
-        shard: &ShardIdentity,
+        shards: &Vec<ShardIdentity>,


nit: &[ShardIdentity] throughout.

erikgrinaker · 2024-12-19T15:05:36Z

libs/wal_decoder/src/serialized_batch.rs

        // This duplicates some of the work below, but it's empirically much faster.
-        let estimated_buffer_size = Self::estimate_buffer_size(&decoded, shard, pg_version);
-        let mut buf = Vec::<u8>::with_capacity(estimated_buffer_size);
+        let mut shard_batches = HashMap::with_capacity(shards.len());


As a follow-up, we should avoid all of these temporary allocations, as this is on the hot path. The easiest way to do that, which also allows reusing allocations across different timelines, is via an object pool -- I don't know which of the Rust implementations are any good, but there's a bunch of them.

In the meanwhile, we could avoid at least one allocation here by having the caller pass in a &mut HashMap<ShardIdentity, SerializedValueBatch> which the function populates for the given shards. Or alternatively use something like smallvec which stack-allocates small vectors. This goes throughout.

erikgrinaker · 2024-12-19T15:13:00Z

libs/wal_decoder/src/decoder.rs

-                    metadata_record = None
+        let mut metadata_records_per_shard = Vec::with_capacity(shards.len());
+        for shard in shards {
+            let mut metadata_for_shard = metadata_record.clone();


We can defer this clone until we know whether the record is relevant for this shard. This is mostly relevant for heap-allocated records -- I haven't checked if they would be frequent enough to matter.

We could also invert the logic here. If we passed in the shards as a HashSet then we could know which shards are relevant and just map the records directly to that shard without checking all of them.

erikgrinaker · 2024-12-19T15:19:51Z

libs/wal_decoder/src/serialized_batch.rs

+                    metadata,
+                    max_lsn,
+                    len,
+                } = &mut batch;


nit: might be clearer to refer to e.g. batch.raw instead of destructuring these into local variables, as a reminder that we're writing directly to the batch here.

erikgrinaker · 2024-12-19T15:23:10Z

pageserver/src/import_datadir.rs

@@ -312,12 +314,16 @@ async fn import_wal(
        let mut modification = tline.begin_modification(last_lsn);
        while last_lsn <= endpoint {
            if let Some((lsn, recdata)) = waldecoder.poll_decode()? {
-                let interpreted = InterpretedWalRecord::from_bytes_filtered(
+                let (got_shard, interpreted) = InterpretedWalRecord::from_bytes_filtered(


nit: might be worth adding a convenience method for the single-shard case, since we end up doing this at a bunch of call sites.

VladLazar · 2024-12-19T15:46:59Z

We're adding a fair number of heap allocations here. I'll leave it to you to decide whether we need to optimize these now or leave it for later (benchmarks would be good).

Yeah, I felt quite naughty writing this stuff. I'll dust up my benchmark for this stuff to see how much we need to optimise here.

VladLazar added 2 commits December 19, 2024 13:13

wal_decoder: derive Clone for MetadataRecord

a3b29b1

VladLazar requested a review from erikgrinaker December 19, 2024 12:19

VladLazar marked this pull request as ready for review December 19, 2024 13:39

VladLazar requested review from a team as code owners December 19, 2024 13:39

VladLazar requested a review from lubennikovaav December 19, 2024 13:39

erikgrinaker approved these changes Dec 19, 2024

View reviewed changes

VladLazar removed the request for review from lubennikovaav December 19, 2024 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safekeeper: decode and interpret for multiple shards in one go #10201

safekeeper: decode and interpret for multiple shards in one go #10201

VladLazar commented Dec 19, 2024

github-actions bot commented Dec 19, 2024

Postgres 17

erikgrinaker left a comment

erikgrinaker Dec 19, 2024

erikgrinaker Dec 19, 2024 •

edited

Loading

erikgrinaker Dec 19, 2024

erikgrinaker Dec 19, 2024

erikgrinaker Dec 19, 2024

VladLazar commented Dec 19, 2024

safekeeper: decode and interpret for multiple shards in one go #10201

Are you sure you want to change the base?

safekeeper: decode and interpret for multiple shards in one go #10201

Conversation

VladLazar commented Dec 19, 2024

Problem

Summary of changes

github-actions bot commented Dec 19, 2024

7095 tests run: 6797 passed, 0 failed, 298 skipped (full report)

Postgres 17

Code coverage* (full report)

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker Dec 19, 2024

Choose a reason for hiding this comment

erikgrinaker Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

erikgrinaker Dec 19, 2024

Choose a reason for hiding this comment

erikgrinaker Dec 19, 2024

Choose a reason for hiding this comment

erikgrinaker Dec 19, 2024

Choose a reason for hiding this comment

VladLazar commented Dec 19, 2024

erikgrinaker Dec 19, 2024 •

edited

Loading