Skip to content

Commit

Permalink
fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#…
Browse files Browse the repository at this point in the history
…10083)

The test was failing with the scary but generic message `Remote storage
metadata corrupted`.

The underlying scrubber error is `Orphan layer detected: ...`.

The test kills pageserver at random points, hence it's expected that we
leak layers if we're killed in the window after layer upload but before
it's referenced from index part.

Refer to generation numbers RFC for details.

Refs:
- fixes #9988
- root-cause analysis
#9988 (comment)
  • Loading branch information
problame authored Dec 13, 2024
1 parent 2c91062 commit fcff752
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions test_runner/regress/test_timeline_archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,6 +435,14 @@ def test_timeline_archival_chaos(neon_env_builder: NeonEnvBuilder):
]
)

env.storage_scrubber.allowed_errors.extend(
[
# Unclcean shutdowns of pageserver can legitimately result in orphan layers
# (https://github.com/neondatabase/neon/issues/9988#issuecomment-2520558211)
f".*Orphan layer detected: tenants/{tenant_id}/.*"
]
)

class TimelineState:
def __init__(self):
self.timeline_id = TimelineId.generate()
Expand Down

1 comment on commit fcff752

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6450 tests run: 6184 passed, 0 failed, 266 skipped (full report)


Flaky tests (9)

Postgres 17

Postgres 16

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Postgres 15

  • test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64
  • test_pgdata_import_smoke[8-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64

Postgres 14

  • test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64
  • test_lr_with_slow_safekeeper: release-x86-64
  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Test coverage report is not available

The comment gets automatically updated with the latest test results
fcff752 at 2024-12-13T17:23:30.561Z :recycle:

Please sign in to comment.