Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xxHash hashing algorithm #15

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

mizvyt
Copy link
Contributor

@mizvyt mizvyt commented Nov 21, 2019

Requested here a while ago:

Currently the fastest hashing algorithm.
Added as default, but can be changed if needed.

@prashnts

@mizvyt
Copy link
Contributor Author

mizvyt commented Nov 21, 2019

Test issue re: #11 (comment)

@prashnts
Copy link
Owner

I'd avoid changing defaults without bumping to a major version, since it can break persistent filters for existing users.

Curiously, xxhash is licensed under BSD 2-Clause. I don't know if it lets us release under current MIT license. Do you have any idea?

Regarding new xxhash, I noticed that it is also available as a on pypi -- and I'm wondering if it's possible to add it as a dependency instead. What do you think?

Also wondering if there's any real performance benefit with it -- did you see any benchmark?

@mizvyt
Copy link
Contributor Author

mizvyt commented Nov 22, 2019

Curiously, xxhash is licensed under BSD 2-Clause. I don't know if it lets us release under current MIT license. Do you have any idea?

Shouldn't be an issue. It's similar to MIT license, so it doesn't implicate anything to this library, only prerequisite is to contain a copyright notice for xxHash, which is already customized and prepended to each file.

Also wondering if there's any real performance benefit with it -- did you see any benchmark?
Benchmarks for xxHash are here for some reference systems:
https://github.com/Cyan4973/xxHash/blob/dev/README.md

I'm assuming since this library uses the same interface and procedure for each hashing algorithm, there shouldn't be an issue to test it and we should see an improvement, albeit I haven't created any benchmarks for it...

I'd avoid changing defaults without bumping to a major version, since it can break persistent filters for existing users.

You're right. I think a better way would be to use it as a Python package extension (along with the other two hashing algorithms), such that when we want to install pybloomfilter with xxHash, we do pip install pybloomfilter[xxhash].

Regarding new xxhash, I noticed that it is also available as a on pypi -- and I'm wondering if it's possible to add it as a dependency instead. What do you think?

I think it's just a Python wrapper for the library, rather than the actual library? In the repo, they use the xxHash repo as a submodule:
https://github.com/ifduyue/python-xxhash/tree/master/deps

I pulled the xxHash code for the latest release (0.7.2), but I agree it would be better to have some dependency management. Of course, same goes for other hashing algorithms, but let's keep it out of scope for now.

I'll stash this PR for now? Will keep it in my branch and rebase with other changes, and give a thought on a more elegant solution.

@prashnts
Copy link
Owner

I haven't created any benchmarks for it...

Been there, done that! I think it's a nice hashing method from what I saw in the code (Once upon a time I ported city-hash in Python (link) so it'd be nice to have "pluggable" hash methods).

Regarding submodule: It usually complicates stuff (eg. if you forget --recursive flag while cloning). Given that it's not particularly huge code, I think copying it here is better, since we can also adapt them if required.

Let's keep it open and once you're sure it doesn't break things from your branch, we can work to get new release out. Sounds good?

@mizvyt
Copy link
Contributor Author

mizvyt commented Nov 24, 2019

Been there, done that! I think it's a nice hashing method from what I saw in the code (Once upon a time I ported city-hash in Python (link) so it'd be nice to have "pluggable" hash methods).

Let's keep it open and once you're sure it doesn't break things from your branch, we can work to get new release out. Sounds good?

Definitely!

@prashnts
Copy link
Owner

How confident are we that this is correct ? xD

@prashnts prashnts self-assigned this Sep 25, 2020
@prashnts prashnts added this to the 0.6.0 milestone Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants