Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: send AppendResponse on segment flush #9692

Merged
merged 1 commit into from
Nov 17, 2024

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Nov 8, 2024

Problem

When processing pipelined AppendRequests, we explicitly flush the WAL every second and return an AppendResponse. However, the WAL is also implicitly flushed on segment bounds, but this does not result in an AppendResponse. Because of this, concurrent transactions may take up to 1 second to commit and writes may take up to 1 second before sending to the pageserver.

Separately, we should consider flushing the WAL on transaction commits -- see #9690.

Resolves #9688.

Summary of changes

Advance flush_lsn when a WAL segment is closed and flushed, and emit an AppendResponse. To accommodate this, track the flush_lsn in addition to the flush_record_lsn.

Note that this will result in more frequent commits during pipelined WAL ingestion, resulting in a control file flush (3 fsyncs) on every segment bound. We should address #9663 first, e.g. by taking control file flushes off of the ingest hot path.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@erikgrinaker erikgrinaker requested a review from arssher November 8, 2024 14:55
@erikgrinaker
Copy link
Contributor Author

@arssher Just submitting this for an early look to see if you agree with the overall approach. This needs a few tweaks and tests/benchmarks.

Copy link

github-actions bot commented Nov 8, 2024

5490 tests run: 5247 passed, 0 failed, 243 skipped (full report)


Flaky tests (2)

Postgres 17

Postgres 15

Code coverage* (full report)

  • functions: 31.5% (7932 of 25165 functions)
  • lines: 49.7% (62856 of 126557 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
e7e2683 at 2024-11-16T15:49:03.309Z :recycle:

@erikgrinaker erikgrinaker force-pushed the erik/respond-segment-flush-lsn branch 2 times, most recently from 10223d1 to 1cf5b69 Compare November 9, 2024 15:40
@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Nov 9, 2024

Here are some benchmark results to illustrate the need for #9698 before merging this.

wal_acceptor_throughput/fsync=true/commit=true/size=1048576 pipelines 1 GB of AppendRequest with 1 MB logical messages, and sets commit_lsn to the previous message. commit_lsn gets clamped to flush_record_lsn by the Safekeeper (this is equivalent to the case where it lags behind the quorum, but the same holds for the quorum).

Before this change, flush_record_lsn was only advanced every second, and the compute only received an AppendResponse every second. Thus the Safekeeper only advanced its commit_lsn at the end of the 1 GB ingestion, which is when the control file is flushed.

With this change, the flush_record_lsn advances every segment flush, allowing the commit_lsn to advance every segment. This means that the control file is also flushed every segment.

Without #9698, this increases the number of control file flushes from 2 to 60 (each with 3 fsyncs on the ingest path), reducing throughput by 30%:

wal_acceptor_throughput/fsync=true/commit=true/size=1048576
                        time:   [2.3644 s 2.3874 s 2.4096 s]
                        thrpt:  [424.97 MiB/s 428.91 MiB/s 433.09 MiB/s]
                 change:
                        time:   [+41.215% +43.684% +45.926%] (p = 0.00 < 0.05)
                        thrpt:  [-31.472% -30.403% -29.186%]

60 control file flushes in 0.755s

With #9698, only 5 control file flushes happen, and more importantly, these happen off of the ingest hot path. Thus throughput remains unchanged:

wal_acceptor_throughput/fsync=true/commit=true/size=1048576
                        time:   [1.6705 s 1.6885 s 1.7071 s]
                        thrpt:  [599.85 MiB/s 606.46 MiB/s 612.98 MiB/s]
                 change:
                        time:   [-0.1404% +1.6188% +3.4530%] (p = 0.11 > 0.05)
                        thrpt:  [-3.3377% -1.5930% +0.1406%]

5 control file flushes in 0.064s

So #9698 is a necessary prerequisite to this PR.

@erikgrinaker erikgrinaker force-pushed the erik/respond-segment-flush-lsn branch from 1cf5b69 to 71f04f9 Compare November 11, 2024 10:28
@erikgrinaker
Copy link
Contributor Author

This should be ready for review now. We don't have any tests for wal_storage afaict, so I haven't added any further tests. We should add a proper test suite for this separately.

The benchmarks in #9692 (comment) confirm that this results in more frequent commits and no performance regression (assuming #9698 merges first). I'll add a separate benchmark measuring commit latency as part of #9690.

@erikgrinaker erikgrinaker marked this pull request as ready for review November 11, 2024 10:35
@erikgrinaker erikgrinaker requested a review from a team as a code owner November 11, 2024 10:35
@arssher
Copy link
Contributor

arssher commented Nov 12, 2024

Approach LGTM.

@erikgrinaker
Copy link
Contributor Author

Thanks! I think this should be good for a final review -- anything you think is missing?

safekeeper/src/wal_storage.rs Show resolved Hide resolved
safekeeper/src/wal_storage.rs Show resolved Hide resolved
safekeeper/src/wal_storage.rs Show resolved Hide resolved
@erikgrinaker erikgrinaker force-pushed the erik/respond-segment-flush-lsn branch from 4bfdf46 to e7e2683 Compare November 16, 2024 14:50
@erikgrinaker erikgrinaker merged commit de7e4a3 into main Nov 17, 2024
80 checks passed
@erikgrinaker erikgrinaker deleted the erik/respond-segment-flush-lsn branch November 17, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

safekeeper: eagerly return AppendResponse with flush_lsn on segment fsync
2 participants