-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving/Loading a filter of 'bytes' objects doesn't work #31
Comments
@DavidJBianco which version of the lib are you running? |
Sorry if it's dumb question, but just looking at the snippet at the moment to help triage this (will check the source later), did you close that f2 in first example before starting second interpreter? Also the capacity 10, and error rate 0.1 aren't the best parameters to choose, it's "too small" for bloomfilters, the standard sets should perform better and more reliably ("small" depends on your use case, but this capacity might have 1/9 chances of error :) I'd try with a bigger capacity filter, and also try to isolate whether this is due to mmap access stuff (slightly related #38 ), edge cases in persistence, or pythons implementation detail. Because otherwise we're breaking the "no false negatives" part and that needs to be fixed ! |
I can confirm this bug. Try the following script (run it twice): from pybloomfilter import BloomFilter
import os
FILTER = 'test.bloom'
# binary test data
VALUES = [ a.encode() for a in """Lorem ipsum dolor sit amet,
consectetur adipiscing elit. Fusce
imperdiet augue nec erat finibus,
sit amet gravida ipsum dictum. Ut
pellentesque, tellus at tempor
fermentum, nulla sem accumsan enim,
ac malesuada leo dui non elit. Sed
vestibulum euismod tortor, vel
pellentesque lacus dignissim ac.
Proin eleifend cursus maximus.
Praesent ornare ex non tempus luctus.
""".split('\n')]
# create filter if it doesn't already exist
if not os.path.exists(FILTER):
bf = BloomFilter(1000, 0.001, FILTER)
for v in VALUES:
bf.add(v)
print("created filter");
# make sure it is properly synced
bf.sync()
bf.close()
bf = BloomFilter.open(FILTER)
for value in VALUES:
print(f"{value.decode():<45} in bf {value in bf}") I get the following output:
As you can clearly see, the filter is unusable the second time, the only value that is still "recognised" is an empty string. I am running version 0.5.3. |
I just checked out this repo trying to debug the issue and noticed that this bug seems to be fixed in the latest (unreleased) version, while checking out 8542533 fails this test. It seems that #41 fixed this. Would it be possible to get a new release out with this fix, as it is currently blocking us and poses a potential trap for others. |
So we can definitely make a new release with #41 patch, I will check and see if that fixes it. Thanks for a working example! I was able to reproduce that v0.5.3 does make the filter unusable, while master on HEAD works as intended. I'll push a release tonight or tomorrow. |
Alright, I've released new version (v0.5.5) on PyPI! It's a patch release so hopefully you won't need to do anything in your build as long as it's not pinned. I messed up on v0.5.4 release and accidentally uploaded wrong source, but thankfully noticed it soon enough to make another patch! That version is yanked on PyPI. Hope this fixes your issue, and please open another issue if it does not. |
I have code that creates a filter containing a bunch of
bytes
objects (basically, MD5 hashes of things). While that Python interpreter is still running, I can load the saved filter into a different filter object and it works:However, if I quit the interpreter and load the filter again, it no longer works:
As you can see, I tested the existence of the same hash value, but got a different result. However, if I create the filter using strings instead of
bytes
objects, the save/reload test works:Then starting a new interpreter:
The text was updated successfully, but these errors were encountered: