perf: high read amplification #4377

skyzh · 2023-05-30T14:28:44Z

Steps to reproduce

Expected result

Actual result

Given that we never merge existing delta layers, if a page is updated multiple times throughout a long time, the read cost of reconstructing the page would be high. Currently it would be reading 6 files on average and 20 files p99 to reconstruct a page.

This would be resolved as a long-term task as in #4359

Environment

Logs, links

akashmahalik · 2023-05-31T21:09:12Z

Sorry if I am asking dumb questions !

Is the page getting updated in-place
OR

A new page is created by reading the existing one. (Isn't the existing page indexed ?)
Update stuff in the page.
Point it to the new page.

Can you please clarify if the read amplification is the one I mentioned in Point No. 1 ?

What do you mean by reading files in this case ?

skyzh · 2023-05-31T21:54:48Z

It is like the latter one but not exactly. Reads involve finding all deltas and the base image, and reconstruct the latest page content.

akashmahalik · 2023-06-01T09:19:13Z

I read through the compaction epic and seems like the compaction isn’t done yet and we are trying to reconstruct the page by reading all the small files.

I have few questions

Do you have an index (sparse index -> key : byte offset) as metadata with a fixed size beginning of the file ? More like pointing to a block within a file. Future optimisation : that block can be compressed to reduce I/O bandwidth.
Might be possible the read amplification are just false positives where you go through all the files just to find nothing. Maybe a bloom filter might help in this case.

skyzh · 2023-06-01T12:35:14Z

Yes. Reading bloom filter is still costly in I/O.

skyzh added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver and removed t/bug Issue Type: Bug labels May 30, 2023

shanyp mentioned this issue Jul 19, 2023

Epic: reduce space amplification #4754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: high read amplification #4377

perf: high read amplification #4377

skyzh commented May 30, 2023

akashmahalik commented May 31, 2023

skyzh commented May 31, 2023

akashmahalik commented Jun 1, 2023 •

edited

Loading

skyzh commented Jun 1, 2023

perf: high read amplification #4377

perf: high read amplification #4377

Comments

skyzh commented May 30, 2023

Steps to reproduce

Expected result

Actual result

Environment

Logs, links

akashmahalik commented May 31, 2023

skyzh commented May 31, 2023

akashmahalik commented Jun 1, 2023 • edited Loading

skyzh commented Jun 1, 2023

akashmahalik commented Jun 1, 2023 •

edited

Loading