State Size Management #1692
Replies: 1 comment 3 replies
-
To support verifiability of data that may not longer be in the active state, we can maintain a merkle trie containing a mapping of height -> block hash. Since we only insert monotonically increasing keys into the trie, validators that only need to insert and verify new additions to the trie only need to keep the rightmost side of the trie. Any node with the full history or off-chain indexers can then supply merkle proofs for any individual block or a range of blocks in the history of the chain. If a chain wants to make further historical data verifiable, it can include a commitment in each block header, so that a client can fetch a historical block proof and verify the data committed to by that block. For action developers, we can include a commitment to all action results, so that clients can verify a proof of any action result in the history of the chain based only on the latest block in the chain. |
Beta Was this translation helpful? Give feedback.
-
State Size
State growth is a critical bottleneck for blockchain scaling because as state gets larger, the merkle trie and database handling that storage both suffer from write amplification ie. it gets more expensive to use as it gets bigger. This requires limiting how quickly the state will grow to prevent runaway state growth.
To increase throughput while maintaining stability, it's important to consider two metrics:
If blocks typically take 100ms to execute on your network, but user submitted transactions can trigger a block that takes 10s to execute, then the network will not be very stable when it counters a large sequence of mega blocks that take 10s a piece to execute. To provide reliable performance, we need a tight upper bound on worst case block execution time. Unfortunately, state growth and write amplification make this a moving target over time. As blockchain state size increases, storage operations can take longer to execute, increasing your worst case block execution time. If this goes unchecked, then network reliability will degrade as the state size increases. As a result, we want to limit state growth to limit how much the worst case block execution time can increase over a year.
To build a reliable blockchain, you need the tools to handle or limit state growth over time. This post provides aims to provide different design directions that may be suited for different applications.
Application Defined State Expiry
Application specific chains may have an application level notion of state expiry. For example, a CLOB may have orders that always expire after a day. If orders are the only user defined state, then the easiest solution to handling state growth may be to simply garbage collect orders as they expire and rate limit how quickly orders can be created, so that the state can never grow unbounded.
This can be accomplished with a simple Before/After block processing hook giving developers the chance to iterate over any potentially expired state at a regular cadence.
State Rent
Storage is consumed over a period of time rather than at a single point in time like CPU/bandwidth consumption. Therefore, we can charge state rent to account for the fact that storage is taken up over a duration D rather than only charging an upfront cost.
A simple way to handle state rent is to maintain a balance for each entry in storage and maintain an ordering of the next item to get evicted (balance drops to 0 because it cannot pay for the next time interval).
To handle this we'll maintain an accumulator value
acc
that tracks the total fees per unit of storage over the full history. We'll keep a sorted list (heap in the pseudocode below) of storage entries ordered from lowest balance to highest (next to be evicted from storage). We can now loop over the storage entries heap on each block to see if any need to be evicted. To add a new storage entry, we can add a new item with a given balance + the accumulated fee so far to normalize it and treat the new entry as if it has already paid fees from the beginning of the network to the time it enters (don't charge for the storage it didn't use).See the original post on this here: https://gist.github.com/aaronbuchwald/15fb4644ba9e6a72690ee285bd79fc59
State Expiry
One Ethereum proposal is to use a chain of verkle trees to manage state (https://verkle.info/#block-cebc1e71fbc44c07b0887fc2b2e967e7). This explanation will get a few details wrong because I'd need to re-read it a few times to get it exactly right, but at a high level, this keeps the state in a chain of verkle trees including the current state N, the last state N-1, and a chain of historical verkle tree roots for time ranges [0, N-2].
At the start of each new period (N+1), we create a fresh new state. If you access state from N, then it gets moved into N+1. At the same time, the state that is now two periods old in N-1 can be pruned and replaced with a verkle root. To refresh/revive state in N-1 or older, users must submit a verkle proof and pay a higher cost to bring that state back into the current state.
By placing a bound on the amount of new state that can be written, refreshed, or revived, we get a nice upper bound on the total size of the current state validators need to keep around.
Inspired by this design, you could alternatively support a tiered storage design keeping one trie fully in-memory and a second trie fully on-disk. If state is not accessed for a year, then it can be moved to disk. To move state from the disk trie back to the in-memory trie, we can charge a higher access cost and can overcharge by a healthy margin to ensure it doesn't degrade our worst case performance. This would still suffer write amplification in the on-disk storage trie, but by overcharging for it, we can limit the worst case block execution time.
Capped State Size by Construction
We can alternatively create a network where the total state size is capped by construction. If we create a network with a hard-capped token supply X, then if each unit of token X purchases one byte of storage, then you have a hard cap on the total state size of X bytes.
This provides an upper bound and can be adapted for non hard-capped token supply networks as well to enforce a growing maximum state size over time. Ie. if you have a current token supply of Y and an inflation rate of 10%, then your maximum state size a year from now will be 1.1Y.
The storage token model can either re-use the native token or create a separate storage token allowed to float independently. In a single token model, this has two big downsides:
A two token model avoids these issues, but may be prone to manipulation ie. someone corners the market for storage by buying a large amount of the token and then selling it back to the market.
Beta Was this translation helpful? Give feedback.
All reactions