Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] pageserver: use interpreted wal proto by default #9747

Conversation

VladLazar
Copy link
Contributor

Problem

Summary of changes

@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 55faef6 to 474c03a Compare November 13, 2024 14:49
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 0024fe2 to 492c656 Compare November 13, 2024 14:50
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 474c03a to 55f5e89 Compare November 13, 2024 16:01
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 492c656 to 6472a73 Compare November 13, 2024 16:02
Copy link

github-actions bot commented Nov 13, 2024

5490 tests run: 5245 passed, 2 failed, 243 skipped (full report)


Failures on Postgres 17

Failures on Postgres 15

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_lfc_resize[release-pg15] or test_large_records[debug-pg17]"
Flaky tests (1)

Postgres 16

Test coverage report is not available

The comment gets automatically updated with the latest test results
d0d34e1 at 2024-11-18T12:20:24.217Z :recycle:

@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 55f5e89 to a08f5f8 Compare November 14, 2024 10:51
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 6472a73 to d2666e0 Compare November 14, 2024 10:52
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from a08f5f8 to c4ae5f8 Compare November 14, 2024 13:58
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from d2666e0 to 42e223e Compare November 14, 2024 13:59
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from c4ae5f8 to 25db978 Compare November 14, 2024 15:46
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 42e223e to 0b84b29 Compare November 14, 2024 15:47
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 25db978 to c1e1b25 Compare November 14, 2024 18:06
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 0b84b29 to eb58c89 Compare November 14, 2024 18:06
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch 2 times, most recently from dcf30e3 to c8b8b05 Compare November 15, 2024 11:17
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from eb58c89 to 9cdd01e Compare November 15, 2024 11:17
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from c8b8b05 to 7a7031f Compare November 15, 2024 11:35
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 9cdd01e to 9247fd0 Compare November 15, 2024 11:36
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from 7a7031f to f51ff0a Compare November 15, 2024 17:00
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 9247fd0 to 7cfc9b8 Compare November 15, 2024 17:02
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal branch from f51ff0a to 8c4deee Compare November 18, 2024 11:11
@VladLazar VladLazar force-pushed the vlad/safekeeper-interpret-wal-test-enable branch from 7cfc9b8 to d0d34e1 Compare November 18, 2024 11:13
@VladLazar VladLazar closed this Nov 20, 2024
github-merge-queue bot pushed a commit that referenced this pull request Nov 25, 2024
…#9746)

## Problem

For any given tenant shard, pageservers receive all of the tenant's WAL
from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth
concerns and CPU overhead of filtering
out the records.

## Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver
into the safekeeper.

A customised PG replication protocol is used where instead of sending
raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol
selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config:
`wal_receiver_protocol`.

 More granularly the changes are:
1. Optionally inject the protocol and shard identity into the arguments
used for starting replication
2. On the safekeeper side, implement a new wal sending primitive which
decodes and interprets records
 before sending them over
3. On the pageserver side, implement the ingestion of this new
replication message type. It's very similar
 to what we already have for raw wal (minus decoding and interpreting).
 
 ## Notes
 
* This PR currently uses my [branch of
rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support)
which includes the deserialization logic for the new replication message
type. PR for that is open
[here](neondatabase/rust-postgres#32).
* This PR contains changes for both pageservers and safekeepers. It's
safe to merge because the new protocol is disabled by default on the
pageserver side. We can gradually start enabling it in subsequent
releases.
* CI tests are running on #9747
 
 ## Links
 
 Related: #9336
 Epic: #9329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant