-
Notifications
You must be signed in to change notification settings - Fork 557
Add cardinality support to BloomFilter. #133
base: master
Are you sure you want to change the base?
Conversation
The tests pass for me locally on Java 7, and the build failure in Travis looks to be completely unrelated to this code. I'm not sure what to do about it. |
Ultimately, I think it would be nice to make BloomFilter implement ICardinality as well, but that will require a larger refactor since currently BloomFilter doesn't accept hashes of items directly. I think it is possible to support that functionality though with some bigger changes. |
Looking at the build history #108 has the same build failure as well, so it seems highly unlikely that the build failure is due to this code. |
I rebased to the latest master which drops the openjdk7 build, so the build now passes. |
Based on my read of the paper cited there, I'd want something that helps check whether A is sufficiently far from N for the estimate to be valid. (Sorry for the unintentional close there) |
I believe when A approaches N, that means The alternative given in the paper is to use equation 5 under these circumstances for an alternative estimate. What threshold do you think is appropriate for "sufficiently far"? |
can you add more tests of larger, random data sets to show this works for non-trivial use cases? |
This is based on the formula provided by the Wikipedia article:
https://en.wikipedia.org/wiki/Bloom_filter#Approximating_the_number_of_items_in_a_Bloom_filter