Skip to content

Commit

Permalink
fix(pageserver): allow repartition errors during gc-compaction smoke …
Browse files Browse the repository at this point in the history
…tests (#10164)

## Problem

part of #9114

In #10127 we fixed the race,
but we didn't add the errors to the allowlist.

## Summary of changes

* Allow repartition errors in the gc-compaction smoke test.

I think it might be worth to refactor the code to allow multiple threads
getting a copy of repartition status (i.e., using Rcu) in the future.

Signed-off-by: Alex Chi Z <[email protected]>
  • Loading branch information
skyzh authored Dec 18, 2024
1 parent 8569629 commit 1d12efc
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
2 changes: 1 addition & 1 deletion pageserver/src/tenant/timeline/compaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1823,7 +1823,7 @@ impl Timeline {
// by estimating the amount of files read for a compaction job. We should also partition on LSN.
let ((dense_ks, sparse_ks), _) = {
let Ok(partition) = self.partitioning.try_lock() else {
bail!("failed to acquire partition lock");
bail!("failed to acquire partition lock during gc-compaction");
};
partition.clone()
};
Expand Down
4 changes: 4 additions & 0 deletions test_runner/regress/test_compaction.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ def test_pageserver_gc_compaction_smoke(neon_env_builder: NeonEnvBuilder):
}

env = neon_env_builder.init_start(initial_tenant_conf=SMOKE_CONF)
env.pageserver.allowed_errors.append(
r".*failed to acquire partition lock during gc-compaction.*"
)
env.pageserver.allowed_errors.append(r".*repartition() called concurrently.*")

tenant_id = env.initial_tenant
timeline_id = env.initial_timeline
Expand Down

1 comment on commit 1d12efc

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7095 tests run: 6795 passed, 2 failed, 298 skipped (full report)


Failures on Postgres 17

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_pageserver_small_inmemory_layers[debug-pg17-True] or test_pageserver_small_inmemory_layers[debug-pg17-False]"
Flaky tests (5)

Postgres 17

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Postgres 16

  • test_physical_replication_config_mismatch_max_locks_per_transaction: release-arm64

Postgres 15

  • test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64
  • test_physical_replication_config_mismatch_max_locks_per_transaction: release-x86-64
  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Test coverage report is not available

The comment gets automatically updated with the latest test results
1d12efc at 2024-12-18T16:32:42.979Z :recycle:

Please sign in to comment.