Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update storage design docs to include the publish step #391

Merged
merged 4 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 15 additions & 21 deletions storage/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,34 +29,28 @@ A table with a single row which is used to keep track of the next assignable seq
This holds batches of entries keyed by the sequence number assigned to the first entry in the batch.

### `IntCoord`
TODO: add the new checkpoint updater logic, and update the docstring in aws.go.

This table is used to coordinate integration of sequenced batches in the `Seq` table.
This table is used to coordinate integration of sequenced batches in the `Seq` table, and keep track of the current tree state.

## Life of a leaf

TODO: add the new checkpoint updater logic.

1. Leaves are submitted by the binary built using Tessera via a call the storage's `Add` func.
2. [Not implemented yet - Dupe squashing: look for existing `<identity_hash>` object, read assigned sequence number if present and return.]
3. The storage library batches these entries up, and, after a configurable period of time has elapsed
1. The storage library batches these entries up, and, after a configurable period of time has elapsed
or the batch reaches a configurable size threshold, the batch is written to the `Seq` table which effectively
assigns a sequence numbers to the entries using the following algorithm:
In a transaction:
1. selects next from `SeqCoord` with for update ← this blocks other FE from writing their pools, but only for a short duration.
2. Inserts batch of entries into `Seq` with key `SeqCoord.next`
3. Update `SeqCoord` with `next+=len(batch)`
4. Integrators periodically integrate new sequenced entries into the tree:
1. Inserts batch of entries into `Seq` with key `SeqCoord.next`
1. Update `SeqCoord` with `next+=len(batch)`
1. Integrators periodically integrate new sequenced entries into the tree:
In a transaction:
1. select `seq` from `IntCoord` with for update ← this blocks other integrators from proceeding.
2. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
3. Write leaf bundles to S3 using batched entries
4. Integrate in Merkle tree and write tiles to S3
5. Update checkpoint in S3
6. Delete consumed batches from `Seq`
7. Update `IntCoord` with `seq+=num_entries_integrated`
8. [Not implemented yet - Dupe detection:
1. Writes out `<identity_hash>` containing the leaf's sequence number]
1. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
1. Write leaf bundles to S3 using batched entries
1. Integrate in Merkle tree and write tiles to S3
1. Update checkpoint in S3
1. Delete consumed batches from `Seq`
1. Update `IntCoord` with `seq+=num_entries_integrated` and the latest `rootHash`
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Dedup

Expand All @@ -75,12 +69,12 @@ operational overhead, code complexity, and so was selected.

The alpha implementation was tested with entries of size 1KB each, at a write
rate of 1500/s. This was done using the smallest possible Aurora instance
availalbe, `db.r5.large`, running `8.0.mysql_aurora.3.05.2`.
available, `db.r5.large`, running `8.0.mysql_aurora.3.05.2`.

Aurora (Serverless v2) worked out well, but seems less cost effective than
provisionned Aurora for sustained traffic. For now, we decided not to explore this option further.
provisioned Aurora for sustained traffic. For now, we decided not to explore this option further.

RDS (MySQL) worked out well, but requires more admistrative overhead than
RDS (MySQL) worked out well, but requires more administrative overhead than
Aurora. For now, we decided not to explore this option further.

DynamoDB worked out to be less cost efficient than Aurora and RDS. It also has
Expand Down
7 changes: 2 additions & 5 deletions storage/gcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ This table is used to coordinate integration of sequenced batches in the `Seq` t
## Life of a leaf

1. Leaves are submitted by the binary built using Tessera via a call the storage's `Add` func.
1. Dupe squashing (TODO): look for existing `<identity_hash>` object, read assigned sequence number if present and return.
1. The storage library batches these entries up, and, after a configurable period of time has elapsed
or the batch reaches a configurable size threshold, the batch is written to the `Seq` table which effectively
assigns a sequence numbers to the entries using the following algorithm:
Expand All @@ -48,11 +47,9 @@ This table is used to coordinate integration of sequenced batches in the `Seq` t
1. Select one or more consecutive batches from `Seq` for update, starting at `IntCoord.seq`
1. Write leaf bundles to GCS using batched entries
1. Integrate in Merkle tree and write tiles to GCS
1. Update checkpoint in GCS
1. Delete consumed batches from `Seq`
1. Update `IntCoord` with `seq+=num_entries_integrated`
1. Dupe detection (TODO):
1. Writes out `<identity_hash>` containing the leaf's sequence number
1. Update `IntCoord` with `seq+=num_entries_integrated` and the latest `rootHash`
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Dedup

Expand Down
13 changes: 9 additions & 4 deletions storage/mysql/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ The DB layout has been designed such that serving any read request is a point lo

#### `Checkpoint`

A single row that records the current state of the log. Updated after every sequence + integration.
A single row that records the current published checkpoint.

#### `TreeState`

A single row that records the current state of the tree. Updated after every integration.

#### `Subtree`

Expand Down Expand Up @@ -51,12 +55,13 @@ Sequence pool:
Sequence & integrate (DB integration starts here):

1. Takes a batch of entries to sequence and integrate
1. Starts a transaction, which first takes a write lock on the checkpoint row to ensure that:
1. Starts a transaction, which first takes a write lock on the `TreeState` row to ensure that:
1. No other processes will be competing with this work.
1. That the next index to sequence is known (this is the same as the current checkpoint size)
1. That the next index to sequence is known (this is the same as the current tree size)
1. Update the required TiledLeaves rows
1. Perform an integration operation to update the Merkle tree, updating/adding Subtree rows as needed, and eventually updating the Checkpoint row
1. Perform an integration operation to update the Merkle tree, updating/adding Subtree rows as needed, and eventually updating the `TreeState` row
1. Commit the transaction
1. Checkpoints representing the latest state of the tree are published at the configured interval.

## Costs

Expand Down
Loading