Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow AVLTreeDigest's to be identical to another given the same set of inputs #185

Closed
cedric-hansen opened this issue Jan 4, 2022 · 1 comment

Comments

@cedric-hansen
Copy link
Contributor

In certain cases, it is desirable to recreate the exact same AVLTreeDigest given a set of values to add to the tree.

Imagine we have a scenario where we have 10_000 random integers that will be added to the AVLTreeDigest, and we then wish to calculate the median (using quantile(0.5)). Since quantile() is an estimate, and there is an element of randomness in AVLTreeDigest, which can cause the median to be slightly different from run to run. If the actual median of the 10_000 values is 42, on one run, the estimated median might be 43on one run, and 41 on another. In most cases, this estimate is perfectly acceptable (and within a reasonable margin or error), but in other cases, a consistent approximation is significantly more desirable (ie. consistently predicting 43 for the median)

After some brief investigation, the source of the randomness is coming from gen field in AVLTreeDigest, which is an unseeded random object. On each subsequent creation of the AVLTreeDigest, and adding the same set of numbers, the tree will be slightly different.

One potential solution, is to simple make gen non-final, and add a setter function to allow a specific random object to be used. This way, there are no changes in serialization, current behaviour is maintained for existing users, and anyone who wants to use this deterministic behaviour can do so, simply by setting AVLTreeDigest::setRandom(new Random(42))

@cedric-hansen
Copy link
Contributor Author

#183 resolves this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant