-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix misaligned pointer use in hash code #229
base: master
Are you sure you want to change the base?
Conversation
My biggest concern is performance. MurmurHash was considered for its speed. This patch:
Sounds like we should analyze the feasibility of feeding aligned-ptrs-only instead:
I tried using inline bool isAligned( const uint8_t *x )
{
return (reinterpret_cast<size_t>(x) & 0xFu) == 0u;
} and it did not trigger; but a simple "works on my machine" (Linux x64) isn't a scientific measurement. If it's unfeasible to feed aligned ptrs, then we have no other choice but to bite the bullet. Alternative:Brainstorming here: Having 2 versions of the routine and dispatch based on ptr alignment may be an interesting approach; ideally only used on platforms where unaligned reads are actually unsupported (though that's UB) |
To give a specific example, I'm getting odd addresses passed to the hash function by |
3904aa1
to
da6dbf9
Compare
76d820e
to
03fbfb6
Compare
2ee1c83
to
8b3ddcc
Compare
49c451d
to
84c07a0
Compare
a3c7671
to
67f9fdd
Compare
That is a real bug in MurmurHash3 code. It is hidden by the fact that x86/x64/arm64 allows unaligned reads, but should explode on arm32 and maybe other platforms. Other problem is with aliasing - casting (uint8*) to (uint32*) is already UB, as it allows compiler to assume things that are not true and then optimize basing on those assumptions. Moreover, similar problem was found with storing results in (uint32_t)out = h1; Others found this issue too, https://discourse.julialang.org/t/problem-with-issue-murmurhash3-has-undefined-behavior/9807 and I highly recommend to read that thread with justification of using memcpy compiler intrinsic to solve both issues - alignment and aliasing. Here is their fix https://github.com/JuliaLang/julia/pull/30220/files Unfortunately, problems are not fixed in upstream https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp |
Using a different hash algorithm is not out of the question. There have been newer hashing algos since Murmur came out. |
The MurmurHash code was using misaligned pointers, which is undefined behavior according to the C++ standard and probably unsafe on multiple platforms. See Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it? on Stack Overflow.