Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pageserver): support retain_lsn in bottommost gc-compaction #8328

Merged
merged 14 commits into from
Jul 23, 2024

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Jul 9, 2024

Problem

part of #8002

Summary of changes

The main thing in this pull request is the new generate_key_retention function. It decides which deltas to retain and generate images for a given key based on its history + retain_lsn + horizon.

On that, we generate a flat single level of delta layers over all deltas included in the compaction. In the future, we can decide whether to split them over the LSN axis as described in the RFC.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Jul 9, 2024

3126 tests run: 3005 passed, 0 failed, 121 skipped (full report)


Code coverage* (full report)

  • functions: 32.6% (7019 of 21509 functions)
  • lines: 50.2% (55718 of 111048 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
24fb4d5 at 2024-07-23T19:24:25.817Z :recycle:

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first review pass.

pageserver/src/tenant/timeline/compaction.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Outdated Show resolved Hide resolved
@skyzh skyzh requested a review from problame July 11, 2024 20:47
@skyzh skyzh marked this pull request as ready for review July 11, 2024 20:47
@skyzh skyzh requested a review from a team as a code owner July 11, 2024 20:47
@skyzh
Copy link
Member Author

skyzh commented Jul 11, 2024

added tests for the core generate_key_retention function and ready for review :)

@skyzh skyzh force-pushed the skyzh/bottommost-retain-lsn branch from 7cb0f35 to 31b6c2e Compare July 18, 2024 19:28
skyzh added 2 commits July 18, 2024 15:52
Signed-off-by: Alex Chi Z <[email protected]>
@skyzh
Copy link
Member Author

skyzh commented Jul 18, 2024

added timeline e2e unit tests

@skyzh skyzh requested a review from arpad-m July 22, 2024 18:07
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, apart from the added tests, changes since my last review are that we now use lowest_retain_lsn instead of gc_cutoff. What was wrong there?

Also, are you sure that always passing lowest_retain_lsn is correct?
Suppose you have the case of a lot of delta for a given key range. So much delta that we need to create multiple deltas with same key range but split up LSN ranges. The current code, afaict, will create multiple deltas with LSN_range.start == lowest_retain_lsn. No?

Possibly this is a pre-existing issue though?

pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
pageserver/src/tenant/timeline/compaction.rs Show resolved Hide resolved
@skyzh
Copy link
Member Author

skyzh commented Jul 23, 2024

So, apart from the added tests, changes since my last review are that we now use lowest_retain_lsn instead of gc_cutoff

The lowest image layer is created over lowest_retain_lsn instead of gc_cutoff. gc_cutoff can be considered as a retain_lsn.

The current code, afaict, will create multiple deltas with LSN_range.start == lowest_retain_lsn.

Yes, for now, the bottom-most compaction always produce delta layers of the LSN range from lowest_retain_lsn to the GC horizon. We could further split it over the LSN as an optimization in the future.

skyzh and others added 3 commits July 23, 2024 13:31
Co-authored-by: Christian Schwarz <[email protected]>
Signed-off-by: Alex Chi Z <[email protected]>
Signed-off-by: Alex Chi Z <[email protected]>
@problame
Copy link
Contributor

Yes, for now, the bottom-most compaction always produce delta layers of the LSN range from lowest_retain_lsn to the GC horizon. We could further split it over the LSN as an optimization in the future.

Are we not respecting the target file size yet?

@skyzh
Copy link
Member Author

skyzh commented Jul 23, 2024

Are we not respecting the target file size yet?

Yes, not for now, this is tracked in the epic.

skyzh added 3 commits July 23, 2024 13:54
Signed-off-by: Alex Chi Z <[email protected]>
Signed-off-by: Alex Chi Z <[email protected]>
Signed-off-by: Alex Chi Z <[email protected]>
@skyzh skyzh force-pushed the skyzh/bottommost-retain-lsn branch from 6f9685b to 24fb4d5 Compare July 23, 2024 18:33
@skyzh skyzh enabled auto-merge (squash) July 23, 2024 18:39
@skyzh skyzh merged commit 18cf5cf into main Jul 23, 2024
68 of 69 checks passed
@skyzh skyzh deleted the skyzh/bottommost-retain-lsn branch July 23, 2024 23:28
skyzh added a commit that referenced this pull request Jul 31, 2024
should be working after #8328
gets merged. Part of #8002

adds a new perf benchmark case that ensures garbages can be collected
with branches

---------

Signed-off-by: Alex Chi Z <[email protected]>
arpad-m pushed a commit that referenced this pull request Aug 5, 2024
should be working after #8328
gets merged. Part of #8002

adds a new perf benchmark case that ensures garbages can be collected
with branches

---------

Signed-off-by: Alex Chi Z <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants