Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
storage controller: enable timeline CRUD operations to run concurrent…
…ly with reconciliation & make them safer (#8783) ## Problem - If a reconciler was waiting to be able to notify computes about a change, but the control plane was waiting for the controller to finish a timeline creation/deletion, the overall system can deadlock. - If a tenant shard was migrated concurrently with a timeline creation/deletion, there was a risk that the timeline operation could be applied to a non-latest-generation location, and thereby not really be persistent. This has never happened in practice, but would eventually happen at scale. Closes: #8743 ## Summary of changes - Introduce `Service::tenant_remote_mutation` helper, which looks up shards & generations and passes them into an inner function that may do remote I/O to pageservers. Before returning success, this helper checks that generations haven't incremented, to guarantee that changes are persistent. - Convert tenant_timeline_create, tenant_timeline_delete, and tenant_timeline_detach_ancestor to use this helper. - These functions no longer block on ensure_attached unless the tenant was never attached at all, so they should make progress even if we can't complete compute notifications. This increases the database load from timeline/create operations, but only with cheap read transactions.
- Loading branch information
0aa1450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3854 tests run: 3738 passed, 0 failed, 116 skipped (full report)
Flaky tests (1)
Postgres 16
test_lfc_resize
: debug-x86-64Code coverage* (full report)
functions
:32.4% (7259 of 22431 functions)
lines
:50.4% (58817 of 116616 lines)
* collected from Rust tests only
0aa1450 at 2024-08-23T20:12:11.979Z :recycle: