-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: simplify cachekv store with copy-on-write btree #14350
Conversation
This PR fixes the cachekv deadlock issue while keeping it thread-safe. One tricky part of cachekv store is to handle modification while iteration in a thread-safe way, previously it separates unsorted and sorted cache, and sync them when starting iteration, which is complicated and leads to deadlock issues. This PR unifies the cache using tidwall/btree, and leverage the copy-on-write feature to do iteration on an isolated view, which allows safe modifications while iteration. This PR is consensus-breaking because in previous version, iteration is not done solely on isolated view, it also checks the `store.deleted`.
Just to understand what's going on here, because there isn't a lot of discussion and issues on the So what I'd like more clarity on is the difference between this PR and the original PR? |
// item represents a cached key-value pair and the entry of the cache btree. | ||
// If dirty is true, it indicates the cached value is newly set, maybe different from the underlying value. | ||
type item struct { | ||
key []byte | ||
value []byte | ||
dirty bool | ||
} | ||
|
||
// byKeys compares the items by key | ||
func byKeys(a, b item) bool { | ||
return bytes.Compare(a.key, b.key) == -1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move these types to the top
Yeah I should've explained clearer, it starts with optimizing cachekv store for nested cases in #13881, then it end up with replacing MemDB with tidwall/btree to optimize more general cases, then deadlock issues happened which sparks the thread-safety discussions of cachekv store, eventually we find the current version already has the deadlock issue, so we eventually removed the btree lock in #13881 as a fix of the deadlock issue, but it's not thread-safe anymore. |
} | ||
|
||
// Copy the cache. This is a copy-on-write operation and is very fast because | ||
// it only performs a shadowed copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain a bit how this work? I understand that the copy is not done until the btree is modified. This means either the original or the copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/tidwall/btree/blob/master/btreeg.go#L296
I think the writer will check a seq number on nodes and do the copy when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR addresses too many things:
- Simplifies cachekv a lot, especially iterators.
- Gets rid of locks.
- Allows for nested, concurrent iterators.
- Allows for modifications while iterating.
I would rather do this iteratively. Also, as suggested by @alexanderbez I would rather create issues to discuss the problems and possible solutions before PRs. Things that I would like to have better understanding:
- How important is allowing modifications while iterating? This seems to have a significant impact on performance. First, the cache size is unbonded (and it has to be). Thus, copying the entire cache when modifying while iterating could be really expensive. Second, the solution favors iterators' performance over simple get/set operations. Is this justified?
- Are we sure that removing locks is safe?
} | ||
|
||
value := store.parent.Get(key) | ||
store.setCacheValue(key, value, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the key is not in the parent store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll cache the nil value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would make sense to avoid caching the nil value by checking the value returned by store.parent.Get(key)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the non-existence result is cacheable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
const ( | ||
// The approximate number of items and children per B-tree node. Tuned with benchmarks. | ||
// copied from memdb. | ||
bTreeDegree = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the impact of misconfiguring this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it only affect performance
Actually the lock is removed in the previous PR #13881 to address the deadlock issues in released version. EDIT: I guess you mean the mutex in the store, yeah, it seems it should not be removed. Although both parent and btree are thread-safe, but some operations on store sre supposed to be atomic. |
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
I'm not sure, I guess it depends on how much slower it is, and do we want to pay the little price for simpler logic and thread-safety. Need benchmarks to justify it. |
Yeah, I meant the mutex in the store. It may not be needed, but removing it makes cachekv non-thread-safe. |
I think I'll make a smaller PR later to fix the thread-safety issue only, without the controversial refactoring part, closing for now, thanks for reviewing ;) |
Description
Some contexts for this PR
This PR fixes the cachekv deadlock issue while keeping it thread-safe.
One tricky part of cachekv store is to handle modification while iteration in a thread-safe way, previously it separates unsorted and sorted cache, and sync them when starting iteration, which is complicated and leads to deadlock issues.
This PR unifies the cache using tidwall/btree, and leverage the copy-on-write feature to do iteration on an isolated view, which allows safe modifications while iteration.
This PR is consensus-breaking because in previous version, iteration is not done solely on isolated view, it also checks the
store.deleted
.Benchmarks
TODO
Some performance regression is expected in cases iterations are not used, because btree is slower than golang map (see difference here).
But iteration would be faster, because it don't need to sort the cache.
Alternative
One alternative is still using the unsorted cache trick, we can still fix the concurrency/deadlock issues with the copy-on-write btree.
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change