Skip to content

Commit

Permalink
storage controller: enable timeline CRUD operations to run concurrent…
Browse files Browse the repository at this point in the history
…ly with reconciliation & make them safer (#8783)

## Problem

- If a reconciler was waiting to be able to notify computes about a
change, but the control plane was waiting for the controller to finish a
timeline creation/deletion, the overall system can deadlock.
- If a tenant shard was migrated concurrently with a timeline
creation/deletion, there was a risk that the timeline operation could be
applied to a non-latest-generation location, and thereby not really be
persistent. This has never happened in practice, but would eventually
happen at scale.

Closes: #8743 

## Summary of changes

- Introduce `Service::tenant_remote_mutation` helper, which looks up
shards & generations and passes them into an inner function that may do
remote I/O to pageservers. Before returning success, this helper checks
that generations haven't incremented, to guarantee that changes are
persistent.
- Convert tenant_timeline_create, tenant_timeline_delete, and
tenant_timeline_detach_ancestor to use this helper.
- These functions no longer block on ensure_attached unless the tenant
was never attached at all, so they should make progress even if we can't
complete compute notifications.

This increases the database load from timeline/create operations, but
only with cheap read transactions.
  • Loading branch information
jcsp authored Aug 23, 2024
1 parent b65a95f commit 0aa1450
Show file tree
Hide file tree
Showing 6 changed files with 361 additions and 233 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-- This file should undo anything in `up.sql`
DROP INDEX tenant_shards_tenant_id;
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
-- Your SQL goes here
CREATE INDEX tenant_shards_tenant_id ON tenant_shards (tenant_id);
38 changes: 38 additions & 0 deletions storage_controller/src/persistence.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ pub(crate) enum DatabaseOperation {
Detach,
ReAttach,
IncrementGeneration,
PeekGenerations,
ListTenantShards,
InsertTenantShards,
UpdateTenantShard,
Expand Down Expand Up @@ -502,6 +503,43 @@ impl Persistence {
Ok(Generation::new(g as u32))
}

/// When we want to call out to the running shards for a tenant, e.g. during timeline CRUD operations,
/// we need to know where the shard is attached, _and_ the generation, so that we can re-check the generation
/// afterwards to confirm that our timeline CRUD operation is truly persistent (it must have happened in the
/// latest generation)
///
/// If the tenant doesn't exist, an empty vector is returned.
///
/// Output is sorted by shard number
pub(crate) async fn peek_generations(
&self,
filter_tenant_id: TenantId,
) -> Result<Vec<(TenantShardId, Option<Generation>, Option<NodeId>)>, DatabaseError> {
use crate::schema::tenant_shards::dsl::*;
let rows = self
.with_measured_conn(DatabaseOperation::PeekGenerations, move |conn| {
let result = tenant_shards
.filter(tenant_id.eq(filter_tenant_id.to_string()))
.select(TenantShardPersistence::as_select())
.order(shard_number)
.load(conn)?;
Ok(result)
})
.await?;

Ok(rows
.into_iter()
.map(|p| {
(
p.get_tenant_shard_id()
.expect("Corrupt tenant shard id in database"),
p.generation.map(|g| Generation::new(g as u32)),
p.generation_pageserver.map(|n| NodeId(n as u64)),
)
})
.collect())
}

#[allow(non_local_definitions)]
/// For use when updating a persistent property of a tenant, such as its config or placement_policy.
///
Expand Down
Loading

1 comment on commit 0aa1450

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3854 tests run: 3738 passed, 0 failed, 116 skipped (full report)


Flaky tests (1)

Postgres 16

Code coverage* (full report)

  • functions: 32.4% (7259 of 22431 functions)
  • lines: 50.4% (58817 of 116616 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
0aa1450 at 2024-08-23T20:12:11.979Z :recycle:

Please sign in to comment.