Skip to content

v2.0.0: Major release of go-tdigest

Compare
Choose a tag to compare
@caio caio released this 30 Oct 07:01
· 55 commits to master since this release
v2.0.0

Major release of go-tdigest

This release contains major API changes and significant performance improvements to the tdigest package. All users are encouraged to upgrade.

Performance Improvements

The critical path of this library (adding samples to the digest) has been drastically sped up by making use of a binary indexed tree so that prefix sums and updates don't have to necessarily scan most of the storage.

Results from benchcmp in a late 2013 MacAir (running Linux):

benchmark             old ns/op     new ns/op     delta
BenchmarkAdd1-4       187           206           +10.16%
BenchmarkAdd10-4      332           274           -17.47%
BenchmarkAdd100-4     1092          325           -70.24

Additionally, it's possible now to create a digest that uses a custom random number generator, which means that if you were suffering from lock contention (due to heavy usage of the shared rng), you can easily enable more speed gains by creating your digests with:

digest := tdigest.New(
    tdigest.Compression(200),
    tdigest.LocalRandomNumberGenerator(),
)

API Changes

The tdigest API has been drastically simplified with the goal of making it more readily usable without requiring people to read up and understand what, for example, compression means.

Modifications

  • The Add(float64,uint32) method has been renamed to AddWeighted

Additions

  • Construction is now done via New() which accepts configuration parameters while providing sane defaults
  • There is a new Add(float64) method that works as a shortcut for AddWeighted(float64,1)
  • The Count() method has been introduced to allow users to decide what to do when the digest grows too much
  • The CDF(float64) method has been added. It stands for cumulative distribution function and it's useful for asking the inverse of the question asked via Quantile(x): it answers at which fraction (quantile) of the data all seen samples are less than or equal to the given x.

Removals

  • There is no Len() method anymore since it provided no real actionable information
  • New(float64) doesn't exist anymore, it's been replaced by a simpler New() one

External Dependencies

Two dependencies have been introduced (v1.x had zero):

  • yourbasic/fenwick, used to speed up prefix sum computations allowing major performance improvements
  • (test only) leesper/go_rng, for generating non-uniform distributions to assist with testing

Other changes

  • This project now uses dep for dependency management
  • A single digest can be used to summarize more than 4B data points
  • We now have contribution guidelines :-)