Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isthmus: Header Accumulator #259

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

clabby
Copy link
Member

@clabby clabby commented Jun 25, 2024

Overview

Specifies an addition to block execution in the Isthmus hardfork, which designates "header accumulation" blocks and an addition to the extraData field in "header accumulation" blocks' headers to enable this functionality.

TODO

  • Explicit block validity rules.

Possible Ideas

  • [✅] Rather than the receipts batches, why not store header batches? Could be useful for efficient historical state lookups, i.e. in the fault proof program. No less efficient to store the broader commitment, either. h/t @protolambda

@clabby clabby changed the title granite: Receipts Root Accumulator granite: Header Accumulator Jun 25, 2024
@clabby clabby force-pushed the cl/granite-receipt-accumulator branch 3 times, most recently from ab2c9bf to 9a5198d Compare June 25, 2024 03:54
@clabby clabby requested a review from axelKingsley June 25, 2024 03:54
@clabby clabby force-pushed the cl/granite-receipt-accumulator branch 3 times, most recently from 75612be to d635603 Compare June 25, 2024 04:15
@clabby clabby force-pushed the cl/granite-receipt-accumulator branch 2 times, most recently from 957f4b1 to e633af0 Compare June 25, 2024 04:55
For header accumulation blocks (`block.number % HEADER_BATCH_SIZE == 0`), the accumulator tree root
as well as the merkle stack should be encoded as follows:

```txt
Copy link
Member Author

@clabby clabby Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious reviewers' thoughts. Should we also append the header batch root to this? Small bit extra data, though makes it so that we wouldn't need to reconstruct the batch tree off-chain for finding leaves of the accumulator tree. It should be a quick process (with a reasonable HEADER_BATCH_SIZE), and they'll need the full batch tree for inclusion proofs in the batch, though might be nice to have a quick lookup for batch roots as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the size of the redundant data being attached? And would we do any validation that the batch-root included matches what is calculated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did add this, it would be an extra 32 bytes per HEADER_BATCH_SIZE blocks in the extraData field. We would need to add a validation step, where we verify that the leaf (the header batch tree root) is verified to be included within the tree that the merkle stack represents.

@clabby clabby force-pushed the cl/granite-receipt-accumulator branch from e633af0 to d1b2f63 Compare June 25, 2024 05:27
Copy link
Member Author

clabby commented Jun 25, 2024

@clabby clabby added the F-granite Fork: Granite label Jun 25, 2024
Copy link
Contributor

@axelKingsley axelKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Design Doc Template has "Alternatives Considered" and "Risks and Uncertainties" sections which I've found really helpful in the past for my own understanding of the problem space.

Since we're in specs, maybe it doesn't apply the same way, but in lieu of that, a couple questions:

Seems like the problem we're trying to solve is the impending spike in compute requirements when there are many interoperating superchains that all require header-reading. It'd be more efficient to compute the headers all as a batch and to only emit/expect them once per batch-window.
Does this header-batch-window size inform the resolution at which interop can run? Seems like while you wait for the next header batch, you can't make use of any of the other blocks beyond doing the work (which is what we're trying to avoid).

My other question is around the selection of the Merkle Tree. Seems a good choice for compact storage, but this doc also points out that it only requires two merkle proofs as well as _n_ merkle patricia trie proofs to verify inclusion. This seems like an important feature, but I don't fully understand how that mechanism is used by nodes to achieve the goal of only a single L2 block at a height that includes _all receipts_ on a single chain that must be validated to resolve its dependents. Would you mind sketching that out a bit? This could be something very obvious to folks more familiar with Merkle Tree than I am.

## Rationale

After the activation of the [interop hardfork](../interop/overview.md), the computational complexity of proving the
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains
OP Stack's state transition will multiply(?). The current plan for proving the inclusion of remote logs within chains

Might be in this doc farther down, but what is the relationship number-of-chains, total gas, and compute required? I'm thinking it's linear with the total gas of all chains, but not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either, or at least not sure enough to give it a concrete time-complexity.

From what I do know, it will multiply, with a minima of linear complexity increase (scaling with # of chains in the dependency set). Though, due to us having to reproduce the L2 block for every block that a remote log is included in, the computational complexity also grows with the number of relayed messages in the current frame (note: I'm using "frame" here as a non-official term; What I mean is the "frame" in which we're resolving cross-chain dependencies.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum we have to execute in a single FP VM run is a single L2 block (from a single chain). We can extend the output root bisection game to avoid having to execute a block from every L2 chain in the set within a single VM execution. So the cost to having additional chains is pretty negligible. The issue is actually just how old a log can be referenced by an executing message because we have to walk back one block at a time (by hash) to retrieve that referenced log which gets expensive if it's a long way in the past.

| Term | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `ACCUMULATOR_TREE_DEPTH` | `27` |
| `HEADER_BATCH_TREE_DEPTH` | `5` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes out to a 32 leaf nodes, and I think that means a 32x performance optimization, is that correct? As the header concerns now happen only once per HEADER_BATCH_SIZE?

Do any behaviors change if we find that this needs to be increased further? Like, if we had 64 per batch, would we start to put pressure on other parts of the system in negative ways?

Copy link
Member Author

@clabby clabby Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the optimization. HEADER_BATCH_SIZEx less data stored in extraData (since extraData is only expanded once every HEADER_BATCH_SIZE blocks)

I would advise not making HEADER_BATCH_SIZE > 256, since most execution layer clients drop state > 256 blocks old out of memory. With HEADER_BATCH_SIZE <= 256, the lookups for the header hashes should be very quick, since we can directly consult in-memory state at tip.

For header accumulation blocks (`block.number % HEADER_BATCH_SIZE == 0`), the accumulator tree root
as well as the merkle stack should be encoded as follows:

```txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the size of the redundant data being attached? And would we do any validation that the batch-root included matches what is calculated?

@clabby
Copy link
Member Author

clabby commented Jun 25, 2024

@axelKingsley

My other question is around the selection of the Merkle Tree. Seems a good choice for compact storage, but this doc also points out that it only requires two merkle proofs as well as n merkle patricia trie proofs to verify inclusion. This seems like an important feature, but I don't fully understand how that mechanism is used by nodes to achieve the goal of only a single L2 block at a height that includes all receipts on a single chain that must be validated to resolve its dependents. Would you mind sketching that out a bit? This could be something very obvious to folks more familiar with Merkle Tree than I am.

This choice was made to allow for efficient batching of data, and to reduce historical state expansion. By splitting the incremental tree, and including batch roots as leaves, we have a very nice property where only every HEADER_BATCH_SIZE blocks, we include a new merkle stack. In addition, we also reduce the size of the merkle stack in the extraData field by 5 * 20 bytes for every "accumulation block" (block.number % HEADER_BATCH_SIZE == 0), saving quite a bit of storage costs over time.

Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a bit of a hard time picturing how this winds up being used in practice to verify that a log exists on L2. It would be really useful to have a worked example showing how you get from a L2 block hash, back to a receipt from a historic block where all data is accessed by getting the preimage of a hash. Or I guess we could have a new PreimageOracle data type that verifies a provided merkle tree so you can supply a single proof.

Comment on lines +47 to +48
each chain, which currently could require deriving and executing many L2 blocks in the context of the verifiable
environment that the proving system is executing within.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to derive multiple L2 blocks in a single execution - we break that down as part of the modified output root bisection so that we are only executing a single L2 block at a time. The final consolidation step requires checking if the cross-chain dependencies are met by checking if a referenced log exists for each executing message. "Fetching" the referenced log message uses the same approach as for fetching L1 logs - we can walk backwards from the chain head via parentHash to the block, then down the receiptHash trie and get the log by hash.

So we don't need to execute any L2 blocks as part of this step, but currently we have to fetch each header back to the block the claimed log is from which, if the required block is quite old can be very expensive.

Copy link
Member Author

@clabby clabby Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! There are currently no specifications for fault proofs post interop, but the output bisection modification makes sense to reduce derivation + execution work. Can we please get this written down somewhere in the specs?

"Fetching" the referenced log message uses the same approach as for fetching L1 logs - we can walk backwards from the chain head via parentHash to the block, then down the receiptHash trie and get the log by hash.

The proposal seeks to optimize this walkback on L2 during the consolidation step by making the lookup constant time, rather than scaling linearly with the depth of the state relative to tip. Especially for messages that are relayed far after they are originally sent. Interop, as far as I understand it, places no limitations on how old a message can be on a remote chain in order for it to be relayed. Relay protection is on the consumer of the relayed logs, meaning we still may have to prove relayed logs that are incredibly deep in historical state. Is this right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fault proving interop isn't happening for a while, but I made some notes in https://www.notion.so/oplabs/External-Interop-Fault-Dispute-Game-Notes-1537bf9fad054bcfb2245dea88d48d16?pvs=4 - that links to the current draft PR of the specs but it will need some changes. The key thing though is that applying blocks from multiple chains in a single cannon run is never going to work so we definitely won't be doing that. :)

There is a limit on the log you can reference, currently it's 180 days which is far too long to be fault provable without something like this: https://specs.optimism.io/interop/messaging.html#messaging-invariants

## Rationale

After the activation of the [interop hardfork](../interop/overview.md), the computational complexity of proving the
OP Stack's state transition will increase. The current plan for proving the inclusion of remote logs within chains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum we have to execute in a single FP VM run is a single L2 block (from a single chain). We can extend the output root bisection game to avoid having to execute a block from every L2 chain in the set within a single VM execution. So the cost to having additional chains is pretty negligible. The issue is actually just how old a log can be referenced by an executing message because we have to walk back one block at a time (by hash) to retrieve that referenced log which gets expensive if it's a long way in the past.

Comment on lines +57 to +59
optimization for interop. For example, in the fault proof program, this feature removes the need for a commit-reveal
walkback when retrieving data within the historical chain. Instead, we would only need to provide small inclusion
proofs for a constant-time lookup of any data in the historical state accumulator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any place where we access historical information for the L2 chain - only for L1 which won't have this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently, but post-interop, we will have to in order to verify log inclusion for relayed logs where there was a significant delay between the log's emission on chain A and the log being relayed on chain B.

We also do have the walkback on L2 today in the program, though it's not very bad due to output bisection. When fetching PayloadByNumber, etc., we must walk back from the starting output root (the current safe head) when the program is initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will definitely help with interop, just the way I read this section it sounded like it was saying it would help with other exisiting behaviour for fault proofs today, but I don't believe there is anything that would benefit. The agreed starting block from the output root bisection is the unsafe, safe and finalized head initially so the L2 walkback just exits the loop immediately because it has already reached the finalized block.

```

Both the global accumulator tree as well as the header batch tree use `keccak256` as their hashing function. For all
nodes within the header accumulator tree, all commitments are shortened to `TRUNCATED_COMMITMENT` bytes in length to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that if we only have a truncated commitment, we can't use it to retrieve data from the preimage oracle - we need the full hash for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a new PreimageKeyType for this, yes? Should be the same as the keccak route, just need to truncate the resulting digest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh I think that should work. This is where I think having those details fleshed out would be really helpful - we don't want to implement this spec and then find there's something that's suboptimal for actually using it in fault proofs, so it's worth making sure we're actually thinking through all the details of how it will be used.

specs/granite/header_accumulator.md Outdated Show resolved Hide resolved
Specifies an addition to block execution in the Granite hardfork, which
designates "receipt accumulation" blocks and an addition to the
`extraData` field in "receipt accumulation" blocks' headers to enable
this functionality.
@BlocksOnAChain
Copy link
Contributor

As agreed on the ENG staff meeting, I'm assigning this to @protolambda as lead implementer that will own this side of work for Granite hardfork.

@BlocksOnAChain BlocksOnAChain changed the title granite: Header Accumulator Holocene: Header Accumulator Jul 23, 2024
@sebastianst sebastianst removed the F-granite Fork: Granite label Aug 2, 2024
tynes added a commit that referenced this pull request Aug 7, 2024
Remove the execution diff to the interop specs given it is
being replaced by #259
@alfonso-op alfonso-op changed the title Holocene: Header Accumulator Isthmus: Header Accumulator Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

8 participants