-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add persistent cache to BucketListDB #3696
Comments
We'll need more data to prioritize this one: BucketDB beat SQL replaying historical ledgers, which includes a lot of "read only" access (trade attempts, etc), so OS file caches may already provide the bulk of the gains. |
There's a hair to split here I want to be careful about. I suggested this in conversation with @SirTyson and I think there's a good idea here that's adjacent to a much-less-productive idea. I do not think we should replace the existing ltx entry cache with a single unified "database wide cache". I think that would be just as fragile as the existing ltx cache -- the cache's content would be at risk of getting out of sync with the (mutable) bucketlist, and we'd likely wind up flushing it on every bucketlist-modifying operation like we do on the ltx cache today. And we might well have another cache-invalidation catastrophe bug. These are super bad when they happen, we don't want to tempt fate by asking to start anew debugging another fragile cache. But I do think per-bucket entry caches make sense, since buckets are immutable. It's logically challenging to imagine a cache-invalidation bug on a single bucket, because a bucket literally never changes after it's written -- there's no conceptual "invalidation" event to perform. So I think adding such per bucket caches would let us remove the existing fragile ltx cache. Bucketlist lookup would still probe at worst 10 levels * 2 = 20 buckets, it'd just have (like with the index and the bloom filter) a cache to accelerate the read that results from a hit on one of those 20 probes. It would (incidentally) probably perform a bit better since the cache (and any hot entries in it) for the last-level bucket would remain warm across multiple transactions and ledgers. But my main interest here is actually "simplifying the caching we do in the ltx already", moving from "a single fragile cache that can be invalidated" to "a bunch of little robust caches that can't meaningfully be invalidated". |
Closing, no longer makes sense know that we have BucketList snapshoting. |
Originally, this was closed because a per-bucket cache was redundant with parallel bucketlist snapshots + LTX based cache. However, now that overlay and the transaction queue bypasses the LTX cache, we are seeing performance degradation due to a lack of BucketList level caching. At first, I thought a global cache made the most sense. The BucketList is read-only, with the exception of a single For this reason, I think we should go back to per-bucket caches. This is much simpler and probably doesn't lose us much performance wise, if at all. A global cache allows us to bypass the BucketIndex lookup step, while a per-bucket cache does not. However, the BucketIndex is all in-memory, so we probably don't do any additional disk IO if we just have a per-bucket cache (except for the occasional bloom miss). For these reasons, I am back in the per-bucket cache camp. |
if by "per bucket cache" you mean a simple random eviction cache, that sounds fine to me (as we get to control the amount of RAM to allocate for this). I know you mentioned before keeping entire buckets in RAM which may not be doable for a while as it depends on other hardware requirements, especially in the context of watcher nodes that also have to accommodate for other services like Stellar-RPC. |
Ya, for now this is per bucket random eviction cache with configurable size parameters. This should get us parity (or better) with the previous performance before switching to BucketList snapshots for TX queue. We should do this and enable it by default. In addition, I think there are two other interesting experimental flags to expose. One is just to do all reads based on an in-memory version of the BucketList. I think this would be interesting for things like parallel catchup and max TPS test modes for now. Eventually, when we have state archival on classic and require beefier validator SKUs, I think this mode could be the default. On a shorter time period, we could also introduce a flag that caches all Soroban entries in memory. This flag we could enable today in the current protocol without issue since the amount of Soroban state is small, and it's growth is capped via the target BucketList size config setting preventing DOS. I think validators currently use < 2 GB, and we require 16 GB SKUs. This might have some impact on reducing OS disk caches, but would allow us to crank up a lot of the IO related soroban resource limits. |
Currently, there is no persistent database caching layer that persists between ledgers. While
LedgerTxnRoot
maintains a cache for a single ledger, this cache is cleared when the ledger is committed. Before BucketListDB, the backend SQL implementation maintained its own cache. However, now that we are transitioning to BucketListDB, it is worth investigating a database wide cache.For read/write entries, the BucketList structure provides an implicit cache already. Recently modified entries live in the smallest, top level buckets. These buckets are already very quick to search because they use an individual key index and are small enough to fit in OS file page caches. However, Soroban will introduce large, read-only data types that are frequently accessed but rarely written, such as Contract Code. These entries do not benefit from the implicit cache design of the BucketList since they are never rewritten to the top level buckets. We should investigate a persistent cache for these read-only entry types.
Possible implementations to consider include a single BucketList wide cache, a per-bucket cache, or a cache line per entry class (i.e. one cache for ContractCode, one for ContractData, one for Stellar Classic types).
The text was updated successfully, but these errors were encountered: