-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MISC] Improve bulk_contains performance #45
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #45 +/- ##
=======================================
Coverage 91.10% 91.11%
=======================================
Files 37 37
Lines 1226 1227 +1
=======================================
+ Hits 1117 1118 +1
Misses 109 109
☔ View full report in Codecov by Sentry. |
Depending on number of bins, around 2x to 6x
…building with SSE/AVX
#ifndef NDEBUG | ||
assert(bin_words != 0u); | ||
assert(hash_funs != 0u); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auch wegen der Auto-vectorisierung? Denn eigentlich sollten asserts im release mode ja sowieso wegcompiliert werden oder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to actually assert in Debug mode, and not just declare it as UB. No idea what happens (guaranteed) when I have both __builtin_unreachable()
and an assert :)
Things that will also help:
Things that might help:
bulk_contains
, especially in conjunction withbulk_count
The changes rely on auto-vectorization and might be done more elegantly by hand.
192
is faster than before, but relatively slow.128
and256
can be done in one step with AVX2, but192
requires two.192
is around 30% slower than128
or256
.bulk_contains benchmark with 1GiB data, throughput in hashes/sec: