Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: add disk_compacted_lsn #10113

Closed
wants to merge 1 commit into from
Closed

Conversation

erikgrinaker
Copy link
Contributor

Problem

Currently, there is no backpressure based on L0 buildup and compaction debt. During heavy ingestion, this can lead to unbounded read amplification if compaction can't keep up.

Touches #10095.

Summary of changes

Add a disk_compacted_lsn and expose it via TimelineInfo. This tracks the last LSN known to be compacted to local disk (i.e. the last LSN below any L0 or frozen layers). It does not include S3 upload of compacted layers, which is tracked by remote_consistent_lsn instead.

Integration with Safekeeper and compute backpressure will be done later.

Copy link

7740 tests run: 7397 passed, 12 failed, 331 skipped (full report)


Failures on Postgres 17

Failures on Postgres 16

Failures on Postgres 15

Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_pageserver_metrics_removed_after_detach[release-pg14] or test_pageserver_metrics_removed_after_detach[release-pg14] or test_pageserver_metrics_removed_after_detach[release-pg15] or test_pageserver_metrics_removed_after_detach[release-pg15] or test_pageserver_metrics_removed_after_detach[release-pg16] or test_pageserver_metrics_removed_after_detach[release-pg16] or test_pageserver_metrics_removed_after_detach[release-pg17] or test_pageserver_metrics_removed_after_detach[release-pg17] or test_pageserver_metrics_removed_after_detach[release-pg17] or test_pageserver_metrics_removed_after_detach[release-pg17] or test_pageserver_metrics_removed_after_detach[debug-pg17] or test_pageserver_metrics_removed_after_detach[debug-pg17]"
Flaky tests (5)

Postgres 17

Postgres 15

  • test_physical_replication_config_mismatch_too_many_known_xids: release-arm64

Postgres 14

  • test_pgdata_import_smoke[8-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: release-arm64

Test coverage report is not available

The comment gets automatically updated with the latest test results
88f7d05 at 2024-12-12T14:17:22.001Z :recycle:

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But note that we will also need to revisit the compaction path to ensure any backpressure built on this doesn't cause deadlock. For example, does compaction deterministically compact L0 in a way that L0 + frozen layer will eventually < some data size in a timely manner? Currently the answer should be no b/c as far as I remember the compaction_iteration thing is called every 10 minutes or so.

Please also check the regress_test, and ensure that disk_compact_lsn is removed after detaching a timeline.

@erikgrinaker
Copy link
Contributor Author

Yeah, this prototype is too simplistic. See #10095 (comment) and #8390.

@erikgrinaker erikgrinaker deleted the erik/disk-compacted-lsn branch December 16, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants