Skip to content

Commit

Permalink
fix(pageserver): run psql in thread to avoid blocking (#10177)
Browse files Browse the repository at this point in the history
## Problem

ref #10170
ref #9994

The psql command will block the main thread, causing other async tasks
to timeout (i.e., HTTP connect). Therefore, we need to move it to an I/O
executor thread.

## Summary of changes

* run psql connection in a thread

---------

Signed-off-by: Alex Chi Z <[email protected]>
Co-authored-by: John Spray <[email protected]>
  • Loading branch information
skyzh and jcsp authored Dec 19, 2024
1 parent 61fcf64 commit cc138b5
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions test_runner/regress/test_pageserver_layer_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@


async def run_worker_for_tenant(
env: NeonEnv, entries: int, tenant: TenantId, offset: int | None = None
env: NeonEnv,
entries: int,
tenant: TenantId,
offset: int | None = None,
) -> Lsn:
if offset is None:
offset = 0
Expand All @@ -37,12 +40,20 @@ async def run_worker_for_tenant(
finally:
await conn.close(timeout=10)

last_flush_lsn = Lsn(ep.safe_psql("SELECT pg_current_wal_flush_lsn()")[0][0])
loop = asyncio.get_running_loop()
sql = await loop.run_in_executor(
None, lambda ep: ep.safe_psql("SELECT pg_current_wal_flush_lsn()"), ep
)
last_flush_lsn = Lsn(sql[0][0])
return last_flush_lsn


async def run_worker(env: NeonEnv, tenant_conf, entries: int) -> tuple[TenantId, TimelineId, Lsn]:
tenant, timeline = env.create_tenant(conf=tenant_conf)
loop = asyncio.get_running_loop()
# capture tenant_conf by specifying `tenant_conf=tenant_conf`, otherwise it will be evaluated to some random value
tenant, timeline = await loop.run_in_executor(
None, lambda tenant_conf, env: env.create_tenant(conf=tenant_conf), tenant_conf, env
)
last_flush_lsn = await run_worker_for_tenant(env, entries, tenant)
return tenant, timeline, last_flush_lsn

Expand Down

1 comment on commit cc138b5

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7245 tests run: 6936 passed, 1 failed, 308 skipped (full report)


Failures on Postgres 16

  • test_storage_controller_many_tenants[github-actions-selfhosted]: release-x86-64
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_storage_controller_many_tenants[release-pg16-github-actions-selfhosted]"
Flaky tests (4)

Postgres 17

Postgres 16

  • test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64

Postgres 14

  • test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64

Code coverage* (full report)

  • functions: 31.3% (8397 of 26865 functions)
  • lines: 48.0% (66643 of 138941 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
cc138b5 at 2024-12-19T11:55:41.579Z :recycle:

Please sign in to comment.