Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tenant manifests in the scrubber #9942

Merged
merged 16 commits into from
Dec 3, 2024
Merged

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Nov 29, 2024

Support tenant manifests in the storage scrubber:

  • list the manifests, order them by generation
  • delete all manifests except for the two most recent generations
  • for the latest manifest: try parsing it.

I've tested this patch by running the against a staging bucket and it successfully deleted stuff (and avoided deleting the latest two generations).

In follow-up work, we might want to also check some invariants of the manifest, as mentioned in #8088.

Part of #9386
Part of #8088

@arpad-m arpad-m requested a review from a team as a code owner November 29, 2024 14:30
@arpad-m arpad-m requested a review from problame November 29, 2024 14:30
Copy link

github-actions bot commented Nov 29, 2024

7029 tests run: 6721 passed, 0 failed, 308 skipped (full report)


Flaky tests (3)

Postgres 17

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 30.7% (8265 of 26930 functions)
  • lines: 47.6% (65165 of 136763 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ec7d4e2 at 2024-12-03T18:49:20.885Z :recycle:

test_runner/regress/test_timeline_archive.py Outdated Show resolved Hide resolved
test_runner/regress/test_timeline_archive.py Show resolved Hide resolved
storage_scrubber/src/pageserver_physical_gc.rs Outdated Show resolved Hide resolved
storage_scrubber/src/checks.rs Outdated Show resolved Hide resolved
storage_scrubber/src/checks.rs Outdated Show resolved Hide resolved
storage_scrubber/src/checks.rs Outdated Show resolved Hide resolved
storage_scrubber/src/checks.rs Outdated Show resolved Hide resolved
storage_scrubber/src/pageserver_physical_gc.rs Outdated Show resolved Hide resolved
storage_scrubber/src/checks.rs Show resolved Hide resolved
arpad-m and others added 7 commits December 2, 2024 20:14
@arpad-m arpad-m requested a review from problame December 2, 2024 21:22
@arpad-m arpad-m requested a review from problame December 3, 2024 17:23
@arpad-m arpad-m added this pull request to the merge queue Dec 3, 2024
Merged via the queue into main with commit ca85f36 Dec 3, 2024
82 checks passed
@arpad-m arpad-m deleted the arpad/scrubber_manifests branch December 3, 2024 20:41
awarus pushed a commit that referenced this pull request Dec 5, 2024
Support tenant manifests in the storage scrubber:

* list the manifests, order them by generation
* delete all manifests except for the two most recent generations
* for the latest manifest: try parsing it.

I've tested this patch by running the against a staging bucket and it
successfully deleted stuff (and avoided deleting the latest two
generations).

In follow-up work, we might want to also check some invariants of the
manifest, as mentioned in #8088.

Part of #9386
Part of #8088

---------

Co-authored-by: Christian Schwarz <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Dec 11, 2024
This adds some validation of invariants that we want to uphold wrt the
tenant manifest and `index_part.json`:

* the data the manifest has about a timeline must match with the data in
`index_part.json`. It might actually change, e.g. when we do reparenting
during detach ancestor, but that requires the timeline to be
unoffloaded, i.e. removed from the manifest.
* any timeline mentioned in index part, must, if present, be archived.
If we unarchive, we first update the tenant manifest to unoffload, and
only then update index part. And one needs to archive before offloading.
* it is legal for timelines to be mentioned in the manifest but have no
`index_part`: this is a temporary state visible during deletion of the
timeline. if the pageserver crashed, an attach of the tenant will clean
the state up.
* it is also legal for offloaded timelines to have an
`ancestor_retain_lsn` of None while having an `ancestor_timeline_id`.
This is for the to-be-added flattening functionality: the plan is to set
former to None if we have flattened a timeline.

follow-up of #9942
part of #8088
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants