Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: import compaction framework & tiered compaction code, default-off #6768

Closed
8 of 10 tasks
arpad-m opened this issue Feb 15, 2024 · 5 comments
Closed
8 of 10 tasks
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic

Comments

@arpad-m
Copy link
Member

arpad-m commented Feb 15, 2024

Motivation

The algorithms in the pageserver's compaction component aren't perfect and can be improved.

DoD

We have implemented pluggable compaction support, see #5234. We also have implemented some (default off) algorithms and done some initial experiments with them, but commitment to rolling out any larger algorithm changes is optional, not required. The goal is to have the pluggable support merged before we go GA.

Components

Future Work / Punted into Q2 #7554

Other related tasks and Epics

@arpad-m arpad-m added c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic labels Feb 15, 2024
@arpad-m arpad-m self-assigned this Feb 19, 2024
@jcsp
Copy link
Collaborator

jcsp commented Feb 19, 2024

This week:

  • Get the existing PR green for merge.

@arpad-m
Copy link
Member Author

arpad-m commented Feb 26, 2024

Last week: got PR rebased, polished, reviewed, and addressed reviews. This week: merge the PR, work on analyzing the new compaction's characteristics, reading the literature, and work on followups.

arpad-m added a commit that referenced this issue Feb 27, 2024
Rebased version of #5234, part of #6768

This consists of three parts:

1. A refactoring and new contract for implementing and testing
compaction.

The logic is now in a separate crate, with no dependency on the
'pageserver' crate. It defines an interface that the real pageserver
must implement, in order to call the compaction algorithm. The interface
models things like delta and image layers, but just the parts that the
compaction algorithm needs to make decisions. That makes it easier unit
test the algorithm and experiment with different implementations.

I did not convert the current code to the new abstraction, however. When
compaction algorithm is set to "Legacy", we just use the old code. It
might be worthwhile to convert the old code to the new abstraction, so
that we can compare the behavior of the new algorithm against the old
one, using the same simulated cases. If we do that, have to be careful
that the converted code really is equivalent to the old.

This inclues only trivial changes to the main pageserver code. All the
new code is behind a tenant config option. So this should be pretty safe
to merge, even if the new implementation is buggy, as long as we don't
enable it.

2. A new compaction algorithm, implemented using the new abstraction.

The new algorithm is tiered compaction. It is inspired by the PoC at PR
#4539, although I did not use that code directly, as I needed the new
implementation to fit the new abstraction. The algorithm here is less
advanced, I did not implement partial image layers, for example. I
wanted to keep it simple on purpose, so that as we add bells and
whistles, we can see the effects using the included simulator.

One difference to #4539 and your typical LSM tree implementations is how
we keep track of the LSM tree levels. This PR doesn't have a permanent
concept of a level, tier or sorted run at all. There are just delta and
image layers. However, when compaction starts, we look at the layers
that exist, and arrange them into levels, depending on their shapes.
That is ephemeral: when the compaction finishes, we forget that
information. This allows the new algorithm to work without any extra
bookkeeping. That makes it easier to transition from the old algorithm
to new, and back again.

There is just a new tenant config option to choose the compaction
algorithm. The default is "Legacy", meaning the current algorithm in
'main'. If you set it to "Tiered", the new algorithm is used.

3. A simulator, which implements the new abstraction.

The simulator can be used to analyze write and storage amplification,
without running a test with the full pageserver. It can also draw an SVG
animation of the simulation, to visualize how layers are created and
deleted.

To run the simulator:

    cargo run --bin compaction-simulator run-suite

---------

Co-authored-by: Heikki Linnakangas <[email protected]>
@problame
Copy link
Contributor

problame commented Mar 4, 2024

  • read more literature
  • follow-ups
    • PR this week, need to confer with Heikki about some aspects of the code

arpad-m added a commit that referenced this issue Mar 5, 2024
Minor non-functional improvements to tiered compaction, mostly
consisting of comment fixes.

Followup of  #6830, part of #6768
arpad-m added a commit that referenced this issue Mar 6, 2024
Moves some of the (legacy) compaction code to compaction.rs. No
functional changes, just moves of code.

Before, compaction.rs was only for the new tiered compaction mechanism,
now it's for both the old and new mechanisms.

Part of #6768
@jcsp
Copy link
Collaborator

jcsp commented Mar 18, 2024

This week:

  • Compare old/new on gc_feedback (staircasing) test case.
  • Fix the edge case where too many deltas on one key can result in oversized layer files.

Deferring work to make all the tests work with new style compaction (as far as we know these are test assumption issues rather than issues with new compaction) -- the near term goal is to reach "no known issues" status with new compaction.

arpad-m added a commit that referenced this issue Mar 28, 2024
Many tests like `test_live_migration` or
`test_timeline_deletion_with_files_stuck_in_upload_queue` set
`compaction_threshold` to 1, to create a lot of changes/updates. The
compaction threshold was passed as `fanout` parameter to the
tiered_compaction function, which didn't support values of 1 however.
Now we change the assert to support it, while still retaining the
exponential nature of the increase in range in terms of lsn that a layer
is responsible for.

A large chunk of the failures in #6964 was due to hitting this issue
that we now resolved.

Part of #6768.
@jcsp jcsp assigned problame and unassigned arpad-m Apr 29, 2024
@problame problame changed the title Epic: improved compaction and compaction algorithms framework Epic: import compaction framework & tiered compaction code Apr 30, 2024
@problame problame changed the title Epic: import compaction framework & tiered compaction code Epic: import compaction framework & tiered compaction code, default-off Apr 30, 2024
@problame
Copy link
Contributor

Redefined this issue as the work that has been done in Q1:

Epic: improved compaction and compaction algorithms framework
Epic: import compaction framework & tiered compaction code

And moved the uncompleted items into section

Future Work / Punted into Q2 #7554

@problame problame assigned arpad-m and unassigned problame Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

3 participants