Skip to content

Commit

Permalink
tests: use fewer pageservers in test_sharding_split_smoke (#9804)
Browse files Browse the repository at this point in the history
## Problem

This test uses a gratuitous number of pageservers (16). This works fine
when there are plenty of system resources, but causes issues on test
runners that have limited resources and run many tests concurrently.

Related: #9802

## Summary of changes

- Split from 2 shards to 4, instead of 4 to 8
- Don't give every shard a separate pageserver, let two locations share
each pageserver.

Net result is 4 pageservers instead of 16
  • Loading branch information
jcsp authored Nov 20, 2024
1 parent bf7d859 commit 593e350
Showing 1 changed file with 14 additions and 25 deletions.
39 changes: 14 additions & 25 deletions test_runner/regress/test_sharding.py
Original file line number Diff line number Diff line change
Expand Up @@ -515,11 +515,12 @@ def test_sharding_split_smoke(
"""

# We will start with 4 shards and split into 8, then migrate all those
# 8 shards onto separate pageservers
shard_count = 4
split_shard_count = 8
neon_env_builder.num_pageservers = split_shard_count * 2
# Shard count we start with
shard_count = 2
# Shard count we split into
split_shard_count = 4
# We will have 2 shards per pageserver once done (including secondaries)
neon_env_builder.num_pageservers = split_shard_count

# 1MiB stripes: enable getting some meaningful data distribution without
# writing large quantities of data in this test. The stripe size is given
Expand Down Expand Up @@ -591,7 +592,7 @@ def test_sharding_split_smoke(

workload.validate()

assert len(pre_split_pageserver_ids) == 4
assert len(pre_split_pageserver_ids) == shard_count

def shards_on_disk(shard_ids):
for pageserver in env.pageservers:
Expand Down Expand Up @@ -654,9 +655,9 @@ def shards_on_disk(shard_ids):
# - shard_count reconciles for the original setup of the tenant
# - shard_count reconciles for detaching the original secondary locations during split
# - split_shard_count reconciles during shard splitting, for setting up secondaries.
# - shard_count of the child shards will need to fail over to their secondaries
# - shard_count of the child shard secondary locations will get moved to emptier nodes
expect_reconciles = shard_count * 2 + split_shard_count + shard_count * 2
# - split_shard_count/2 of the child shards will need to fail over to their secondaries (since we have 8 shards and 4 pageservers, only 4 will move)
expect_reconciles = shard_count * 2 + split_shard_count + split_shard_count / 2

reconcile_ok = env.storage_controller.get_metric_value(
"storage_controller_reconcile_complete_total", filter={"status": "ok"}
)
Expand Down Expand Up @@ -720,22 +721,10 @@ def check_effective_tenant_config():
# dominated by shard count.
log.info(f"total: {total}")
assert total == {
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 1,
13: 1,
14: 1,
15: 1,
16: 1,
1: 2,
2: 2,
3: 2,
4: 2,
}

# The controller is not required to lay out the attached locations in any particular way, but
Expand Down

1 comment on commit 593e350

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5617 tests run: 5380 passed, 1 failed, 236 skipped (full report)


Failures on Postgres 16

  • test_compaction_l0_memory[github-actions-selfhosted]: release-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_compaction_l0_memory[release-pg16-github-actions-selfhosted]"

Code coverage* (full report)

  • functions: 31.4% (7938 of 25300 functions)
  • lines: 49.3% (62993 of 127730 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
593e350 at 2024-11-20T16:53:07.876Z :recycle:

Please sign in to comment.