-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash function cuasing false postives. (Nasty) #17
Comments
Hey, now that the holidays are over I'm looking into this. Note that false positives are expected with bloom filters --- false negatives are the true nasty behavior. If you say 0.0001 for the error rate and you are below the capacity (which you are), you should expect less than 0.0001 error rate. I am looking into why it's broken. |
Just to confirm, I've had serious problems with both the default SHA512 hash (a large memory leak) and a really bad false positive rate using the (i think) djb hash. I've just tried the superfast hash though and that works very well. Thanks. (python 2.6 on ubuntu 64bit) |
I just submitted a fix for the memory leaks referred to in this issue |
hi axiak, i want to know how is this issue going on? i think "using the hash() function prior to passing the data to pyboomfiler" may increase the false positive rate when the number of input items is large, and the test also proved it. |
I think this is a duplicate of Issue #46 |
Seems fixed in latest release 0.3.14:
What really matters is the first result:
IMHO this ticket should be closed |
I noticed some bad behavior with the pybloomfiler to I wrote a simple set of tests. A ~22% failure rate is really bad. As a quick fix i am using the hash() function prior to passing the data to pyboomfiler. Usually the false positive rate with this function is 0, sometimes its really close to 0.
My simple test:
http://pastebin.com/C8KwWFxR
The output:
('false positive', 0.2251722)
('false negitive', 0.0)
with hash()
('false positive', 1e-07)
('false negitive', 0.0)
with md5()
('false positive', 0.0)
('false negitive', 0.0)
with md5() hexdigigest
('false positive', 0.0)
('false negitive', 0.0)
The text was updated successfully, but these errors were encountered: