More historic state cache optimisations #6485
Labels
blocked
optimization
Something to make Lighthouse run more efficiently.
tree-states
Upcoming state and database overhaul
Description
With the move to tree-states, we've ensured that performance doesn't regress for sequential state loads using this PR:
However there are still a few sub-optimal things that we could work on over time to get even better performance.
What's still slow?
Profiling and logging show that building the caches on beacon states is slow. In particular the committee caches (via
shuffle_list
) and the pubkey caches are the slowest to build. Total time to build all caches in my measurements is around 1.8s.Problem 1: unnecessary cache rebuilding
In #6475 the time to build
BeaconState
caches is amortised slightly, by building them at slot 1 in the epoch rather than slot 0. Slot 0 in the epoch usually takes more time to construct because it involves loading a snapshot or applying diffs, and the state reached has no caches built due to the diff process. Constructing the state at the next slot can be done by replaying 1 block on top of the slot 0 state, but this requires most of the caches to be built (for state processing). Our current approach will lazily initialise the beacon state caches if/when the slot 1 state is requested. This means the load time for states in an epoch goes something like:[2s (diff application), 2s (cache build), 0.5s (everything already cached), 0.5s, 0.5s, ...]
.The problem with this approach arises when the caller for the slot 0 state requires some or all of the caches. In this case, the caller will build the caches on the state cloned from the historic state cache, but will not store the updated state with these caches back into the historic state cache. This sub-optimality was left in #6475 in order to keep the code relatively simple, and because this usually only represents a one-off waste of ~2s of cache building.
Problem 2: unnecessary pubkey cache builds
The building of the pubkey cache is pretty much unnecessary, and could be avoided by reusing existing caches. See:
BeaconState
/BeaconChain
#6484Implementation strategies
Mutable caches on beacon states (problem 1)
One way to solve problem 1 would be to lazily initialise the caches on a
BeaconState
using interior mutability. In this paradigm building a cache at slot 0 from caller code would build it for all copies of the state, including the one in the historic state cache. This would mean that when loading the state at slot 1, caches would be already built.The complexity of this approach is that it requires putting the caches inside the beacon state behind something like
Arc<RwLock<..>>
orArcSwap<..>
. This could be quite an invasive change, especially proportional to the benefit.Per-route optimisation (problem 1)
Another more bespoke way of handling problem 1 would be to optimise each caller (HTTP route, realistically) to be smarter about its cache management. For example: the block rewards API could build just the caches that it needs, and then update the historic state cache with these built caches. The disadvantage of this approach is that it needs to be done for every route, and it leads to states in the cache with all sorts of combinations of their caches built.
Blocked on
I think we should block this on:
The text was updated successfully, but these errors were encountered: