Skip to content

Commit

Permalink
test: fix on demand activation test flakyness (#7180)
Browse files Browse the repository at this point in the history
Warm-up (and the "tenant startup complete" metric update) happens in
a background tokio task. The tenant map is eagerly updated (can happen
before the task finishes).

The test assumed that if the tenant map was updated, then the metric
should reflect that. That's not the case, so we tweak the test to wait
for the metric.

Fixes #7158
  • Loading branch information
VladLazar authored Mar 20, 2024
1 parent a5d5c2a commit 4ba3f35
Showing 1 changed file with 11 additions and 9 deletions.
20 changes: 11 additions & 9 deletions test_runner/regress/test_timeline_size.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
VanillaPostgres,
wait_for_last_flush_lsn,
)
from fixtures.pageserver.http import PageserverHttpClient
from fixtures.pageserver.utils import (
assert_tenant_state,
timeline_delete_wait_completed,
Expand Down Expand Up @@ -684,6 +685,13 @@ def assert_physical_size_invariants(sizes: TimelinePhysicalSizeValues):
# XXX would be nice to assert layer file physical storage utilization here as well, but we can only do that for LocalFS


def wait_for_tenant_startup_completions(client: PageserverHttpClient, count: int):
def condition():
assert client.get_metric_value("pageserver_tenant_startup_complete_total") == count

wait_until(5, 1.0, condition)


def test_ondemand_activation(neon_env_builder: NeonEnvBuilder):
"""
Tenants warmuping up opportunistically will wait for one another's logical size calculations to complete
Expand Down Expand Up @@ -767,10 +775,7 @@ def at_least_one_active():
# That one that we successfully accessed is now Active
expect_activated += 1
assert pageserver_http.tenant_status(tenant_id=stuck_tenant_id)["state"]["slug"] == "Active"
assert (
pageserver_http.get_metric_value("pageserver_tenant_startup_complete_total")
== expect_activated - 1
)
wait_for_tenant_startup_completions(pageserver_http, count=expect_activated - 1)

# The ones we didn't touch are still in Attaching
assert (
Expand All @@ -790,10 +795,7 @@ def at_least_one_active():
== n_tenants - expect_activated
)

assert (
pageserver_http.get_metric_value("pageserver_tenant_startup_complete_total")
== expect_activated - 1
)
wait_for_tenant_startup_completions(pageserver_http, count=expect_activated - 1)

# When we unblock logical size calculation, all tenants should proceed to active state via
# the warmup route.
Expand All @@ -813,7 +815,7 @@ def all_active():
assert (
pageserver_http.get_metric_value("pageserver_tenant_startup_scheduled_total") == n_tenants
)
assert pageserver_http.get_metric_value("pageserver_tenant_startup_complete_total") == n_tenants
wait_for_tenant_startup_completions(pageserver_http, count=n_tenants)

# Check that tenant deletion/detach proactively wakes tenants: this is done separately to the main
# body of the test because it will disrupt tenant counts
Expand Down

1 comment on commit 4ba3f35

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2786 tests run: 2640 passed, 3 failed, 143 skipped (full report)


Failures on Postgres 14

  • test_bulk_insert[neon-github-actions-selfhosted]: release
  • test_basebackup_with_high_slru_count[github-actions-selfhosted-sequential-10-13-30]: release
  • test_basebackup_with_high_slru_count[github-actions-selfhosted-vectored-10-13-30]: release
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_bulk_insert[neon-release-pg14-github-actions-selfhosted] or test_basebackup_with_high_slru_count[release-pg14-github-actions-selfhosted-sequential-10-13-30] or test_basebackup_with_high_slru_count[release-pg14-github-actions-selfhosted-vectored-10-13-30]"

Code coverage* (full report)

  • functions: 28.3% (7131 of 25197 functions)
  • lines: 46.9% (43711 of 93289 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
4ba3f35 at 2024-03-20T11:31:57.897Z :recycle:

Please sign in to comment.