Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple processes on same mmap file #38

Open
leopd opened this issue Jul 17, 2020 · 2 comments
Open

Multiple processes on same mmap file #38

leopd opened this issue Jul 17, 2020 · 2 comments

Comments

@leopd
Copy link
Contributor

leopd commented Jul 17, 2020

Apologies that I think this is really just a Linux system question, but what happens if I have multiple processes using the same mmap bloomfilter? If they're only reading, everything should be wonderful and efficient, right? But what if one or more writes changes? I'm guessing/hoping that kernel will force each process to use the same memory pages that any changes caused by one process will instantly appear for the others? One problem being that the writes can't be atomic, so as one process is writing the different hash values, the other processes will see partial results, which for a bloom filter is probably fine in practice.

@prashnts
Copy link
Owner

Sounds like a very interesting question still!

First off, I don't think I can point out something from your description as odd without digging more into it, as it sounds like how I imagine linux systems to be like, where everything works as intended, and everyone is happy. I'd appreciate links to further info or anything relevant for specifics regarding: Indeed, what will happen? Or maybe it's super well defined area and I just don't know it. (also it's getting late...).

However on another note: when i'm dealing with multiple processes which "may want to write simultaneously", i usually go for some sort of write locking, and back-off thing. If redis is available and relevant, I quickly reach there and use redis for "one more thing we use it for". Practically this has prevented me from having to find out about your scenario!

@karolinepauls
Copy link

A couple of years ago we used the predecessor to this library (pybloomfiltermmap with Python 2) in a highly concurrent way, with multiple reader processes, and the filters themselves synchronised by an external tool.

It worked.

You are right about the lack of atomicity not being a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants