-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize fenwick tree manipulations #23
Conversation
For the record this is the benchmark for compression=20 which is just smaller initial fenwick tree:
|
Whoa this is awesome, thanks a lot for the patches and well detailed PR! I'll make time to look at it asap (this Saturday, most likely), but from a quick look it looks great already |
👍 Overall my understanding is that disabling fenwick tree until t-digest is saturated (number of centroids is close to maximum allowed by compression) going to be beneficial in most workloads. In other cases we still need to rebuild fenwick tree too often and the net result is that code with fenwick tree is slower than code that uses for loop. |
I see no disadvantages with this patch, should even help a bit with Merge-heavy workloads (there's a bit of discussion about one at #20) too. Merged, thanks a lot again!! |
For the record, I think (haven't verified) the fenwick tree change to use uint32 only works because our input is always sorted- if it could be unordered this would likely break: func (l *List) Append(n uint32) {
i := len(l.tree)
l.tree = append(l.tree, 0)
l.tree[i] = n - l.Get(i)
} |
I see. I guess that is fine as long as it improves performance :) I've tried to use honeycomb fork and performance is 40% better for my workload (lots of merges) even with this patch. But either its binary serialization is incompatible or I am using it wrong but I get negative quantiles when I replace caio/go-tdigest with honeycomb/go-tdigest. So I guess I will continue to improve performance of this lib using ideas from honeycomb fork... |
There are 2 separate changes:
When number of adds is small the benchmark becomes slower because tree is bigger and number of rebuilds is not high. But as number of adds grows benchmark improves a lot.
The cumulative result of 2 commits is following:
So it is only slightly slower for small number of inserts and if you chose smaller compression that should not matter at all.
In my projects the speedup is measurable too - from 16 seconds to 6 seconds.