You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In certain cases, it is desirable to recreate the exact same AVLTreeDigest given a set of values to add to the tree.
Imagine we have a scenario where we have 10_000 random integers that will be added to the AVLTreeDigest, and we then wish to calculate the median (using quantile(0.5)). Since quantile() is an estimate, and there is an element of randomness in AVLTreeDigest, which can cause the median to be slightly different from run to run. If the actual median of the 10_000 values is 42, on one run, the estimated median might be 43on one run, and 41 on another. In most cases, this estimate is perfectly acceptable (and within a reasonable margin or error), but in other cases, a consistent approximation is significantly more desirable (ie. consistently predicting 43 for the median)
After some brief investigation, the source of the randomness is coming from gen field in AVLTreeDigest, which is an unseeded random object. On each subsequent creation of the AVLTreeDigest, and adding the same set of numbers, the tree will be slightly different.
One potential solution, is to simple make gen non-final, and add a setter function to allow a specific random object to be used. This way, there are no changes in serialization, current behaviour is maintained for existing users, and anyone who wants to use this deterministic behaviour can do so, simply by setting AVLTreeDigest::setRandom(new Random(42))
The text was updated successfully, but these errors were encountered:
In certain cases, it is desirable to recreate the exact same
AVLTreeDigest
given a set of values to add to the tree.Imagine we have a scenario where we have 10_000 random integers that will be added to the
AVLTreeDigest
, and we then wish to calculate the median (usingquantile(0.5)
). Sincequantile()
is an estimate, and there is an element of randomness inAVLTreeDigest
, which can cause the median to be slightly different from run to run. If the actual median of the 10_000 values is42
, on one run, the estimated median might be43
on one run, and41
on another. In most cases, this estimate is perfectly acceptable (and within a reasonable margin or error), but in other cases, a consistent approximation is significantly more desirable (ie. consistently predicting43
for the median)After some brief investigation, the source of the randomness is coming from
gen
field inAVLTreeDigest
, which is an unseeded random object. On each subsequent creation of theAVLTreeDigest
, and adding the same set of numbers, the tree will be slightly different.One potential solution, is to simple make
gen
non-final, and add a setter function to allow a specific random object to be used. This way, there are no changes in serialization, current behaviour is maintained for existing users, and anyone who wants to use this deterministic behaviour can do so, simply by settingAVLTreeDigest::setRandom(new Random(42))
The text was updated successfully, but these errors were encountered: