v2.0.0: Major release of go-tdigest
Major release of go-tdigest
This release contains major API changes and significant performance improvements to the tdigest package. All users are encouraged to upgrade.
Performance Improvements
The critical path of this library (adding samples to the digest) has been drastically sped up by making use of a binary indexed tree so that prefix sums and updates don't have to necessarily scan most of the storage.
Results from benchcmp
in a late 2013 MacAir (running Linux):
benchmark old ns/op new ns/op delta
BenchmarkAdd1-4 187 206 +10.16%
BenchmarkAdd10-4 332 274 -17.47%
BenchmarkAdd100-4 1092 325 -70.24
Additionally, it's possible now to create a digest that uses a custom random number generator, which means that if you were suffering from lock contention (due to heavy usage of the shared rng), you can easily enable more speed gains by creating your digests with:
digest := tdigest.New(
tdigest.Compression(200),
tdigest.LocalRandomNumberGenerator(),
)
API Changes
The tdigest API has been drastically simplified with the goal of making it more readily usable without requiring people to read up and understand what, for example, compression means.
Modifications
- The
Add(float64,uint32)
method has been renamed toAddWeighted
Additions
- Construction is now done via
New()
which accepts configuration parameters while providing sane defaults - There is a new
Add(float64)
method that works as a shortcut forAddWeighted(float64,1)
- The
Count()
method has been introduced to allow users to decide what to do when the digest grows too much - The
CDF(float64)
method has been added. It stands for cumulative distribution function and it's useful for asking the inverse of the question asked viaQuantile(x)
: it answers at which fraction (quantile) of the data all seen samples are less than or equal to the givenx
.
Removals
- There is no
Len()
method anymore since it provided no real actionable information New(float64)
doesn't exist anymore, it's been replaced by a simplerNew()
one
External Dependencies
Two dependencies have been introduced (v1.x had zero):
- yourbasic/fenwick, used to speed up prefix sum computations allowing major performance improvements
- (test only) leesper/go_rng, for generating non-uniform distributions to assist with testing
Other changes
- This project now uses dep for dependency management
- A single digest can be used to summarize more than 4B data points
- We now have contribution guidelines :-)