-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize basic_string::rfind
(the single character overload)
#5087
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
StephanTLavavej
approved these changes
Nov 19, 2024
5950X results:
Looks good, the only slowdowns are where the character is found immediately and the call is super fast anyways. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #5036
⏱️ Benchmark results
bm<uint8_t, not_highly_aligned_allocator, Op::FindSized>/8021/3056
bm<uint8_t, not_highly_aligned_allocator, Op::FindSized>/63/62
bm<uint8_t, not_highly_aligned_allocator, Op::FindSized>/31/30
bm<uint8_t, not_highly_aligned_allocator, Op::FindSized>/15/14
bm<uint8_t, not_highly_aligned_allocator, Op::FindSized>/7/6
bm<uint8_t, highly_aligned_allocator, Op::FindSized>/8021/3056
bm<uint8_t, highly_aligned_allocator, Op::FindSized>/63/62
bm<uint8_t, highly_aligned_allocator, Op::FindSized>/31/30
bm<uint8_t, highly_aligned_allocator, Op::FindSized>/15/14
bm<uint8_t, highly_aligned_allocator, Op::FindSized>/7/6
bm<uint8_t, not_highly_aligned_allocator, Op::FindUnsized>/8021/3056
bm<uint8_t, not_highly_aligned_allocator, Op::FindUnsized>/63/62
bm<uint8_t, not_highly_aligned_allocator, Op::FindUnsized>/31/30
bm<uint8_t, not_highly_aligned_allocator, Op::FindUnsized>/15/14
bm<uint8_t, not_highly_aligned_allocator, Op::FindUnsized>/7/6
bm<uint8_t, highly_aligned_allocator, Op::FindUnsized>/8021/3056
bm<uint8_t, highly_aligned_allocator, Op::FindUnsized>/63/62
bm<uint8_t, highly_aligned_allocator, Op::FindUnsized>/31/30
bm<uint8_t, highly_aligned_allocator, Op::FindUnsized>/15/14
bm<uint8_t, highly_aligned_allocator, Op::FindUnsized>/7/6
bm<uint8_t, not_highly_aligned_allocator, Op::Count>/8021/3056
bm<uint8_t, not_highly_aligned_allocator, Op::Count>/63/62
bm<uint8_t, not_highly_aligned_allocator, Op::Count>/31/30
bm<uint8_t, not_highly_aligned_allocator, Op::Count>/15/14
bm<uint8_t, not_highly_aligned_allocator, Op::Count>/7/6
bm<uint8_t, highly_aligned_allocator, Op::Count>/8021/3056
bm<uint8_t, highly_aligned_allocator, Op::Count>/63/62
bm<uint8_t, highly_aligned_allocator, Op::Count>/31/30
bm<uint8_t, highly_aligned_allocator, Op::Count>/15/14
bm<uint8_t, highly_aligned_allocator, Op::Count>/7/6
bm<char, not_highly_aligned_allocator, Op::StringFind>/8021/3056
bm<char, not_highly_aligned_allocator, Op::StringFind>/63/62
bm<char, not_highly_aligned_allocator, Op::StringFind>/31/30
bm<char, not_highly_aligned_allocator, Op::StringFind>/15/14
bm<char, not_highly_aligned_allocator, Op::StringFind>/7/6
bm<char, highly_aligned_allocator, Op::StringFind>/8021/3056
bm<char, highly_aligned_allocator, Op::StringFind>/63/62
bm<char, highly_aligned_allocator, Op::StringFind>/31/30
bm<char, highly_aligned_allocator, Op::StringFind>/15/14
bm<char, highly_aligned_allocator, Op::StringFind>/7/6
bm<char, not_highly_aligned_allocator, Op::StringRFind>/8021/3056
bm<char, not_highly_aligned_allocator, Op::StringRFind>/63/62
bm<char, not_highly_aligned_allocator, Op::StringRFind>/31/30
bm<char, not_highly_aligned_allocator, Op::StringRFind>/15/14
bm<char, not_highly_aligned_allocator, Op::StringRFind>/7/6
bm<char, highly_aligned_allocator, Op::StringRFind>/8021/3056
bm<char, highly_aligned_allocator, Op::StringRFind>/63/62
bm<char, highly_aligned_allocator, Op::StringRFind>/31/30
bm<char, highly_aligned_allocator, Op::StringRFind>/15/14
bm<char, highly_aligned_allocator, Op::StringRFind>/7/6
bm<uint16_t, not_highly_aligned_allocator, Op::FindSized>/8021/3056
bm<uint16_t, not_highly_aligned_allocator, Op::FindSized>/63/62
bm<uint16_t, not_highly_aligned_allocator, Op::FindSized>/31/30
bm<uint16_t, not_highly_aligned_allocator, Op::FindSized>/15/14
bm<uint16_t, not_highly_aligned_allocator, Op::FindSized>/7/6
bm<uint16_t, not_highly_aligned_allocator, Op::Count>/8021/3056
bm<uint16_t, not_highly_aligned_allocator, Op::Count>/63/62
bm<uint16_t, not_highly_aligned_allocator, Op::Count>/31/30
bm<uint16_t, not_highly_aligned_allocator, Op::Count>/15/14
bm<uint16_t, not_highly_aligned_allocator, Op::Count>/7/6
bm<wchar_t, not_highly_aligned_allocator, Op::StringFind>/8021/3056
bm<wchar_t, not_highly_aligned_allocator, Op::StringFind>/63/62
bm<wchar_t, not_highly_aligned_allocator, Op::StringFind>/31/30
bm<wchar_t, not_highly_aligned_allocator, Op::StringFind>/15/14
bm<wchar_t, not_highly_aligned_allocator, Op::StringFind>/7/6
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/8021/3056
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/63/62
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/31/30
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/15/14
bm<wchar_t, not_highly_aligned_allocator, Op::StringRFind>/7/6
bm<uint32_t, not_highly_aligned_allocator, Op::FindSized>/8021/3056
bm<uint32_t, not_highly_aligned_allocator, Op::FindSized>/63/62
bm<uint32_t, not_highly_aligned_allocator, Op::FindSized>/31/30
bm<uint32_t, not_highly_aligned_allocator, Op::FindSized>/15/14
bm<uint32_t, not_highly_aligned_allocator, Op::FindSized>/7/6
bm<uint32_t, not_highly_aligned_allocator, Op::Count>/8021/3056
bm<uint32_t, not_highly_aligned_allocator, Op::Count>/63/62
bm<uint32_t, not_highly_aligned_allocator, Op::Count>/31/30
bm<uint32_t, not_highly_aligned_allocator, Op::Count>/15/14
bm<uint32_t, not_highly_aligned_allocator, Op::Count>/7/6
bm<char32_t, not_highly_aligned_allocator, Op::StringFind>/8021/3056
bm<char32_t, not_highly_aligned_allocator, Op::StringFind>/63/62
bm<char32_t, not_highly_aligned_allocator, Op::StringFind>/31/30
bm<char32_t, not_highly_aligned_allocator, Op::StringFind>/15/14
bm<char32_t, not_highly_aligned_allocator, Op::StringFind>/7/6
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/8021/3056
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/63/62
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/31/30
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/15/14
bm<char32_t, not_highly_aligned_allocator, Op::StringRFind>/7/6
bm<uint64_t, not_highly_aligned_allocator, Op::FindSized>/8021/3056
bm<uint64_t, not_highly_aligned_allocator, Op::FindSized>/63/62
bm<uint64_t, not_highly_aligned_allocator, Op::FindSized>/31/30
bm<uint64_t, not_highly_aligned_allocator, Op::FindSized>/15/14
bm<uint64_t, not_highly_aligned_allocator, Op::FindSized>/7/6
bm<uint64_t, not_highly_aligned_allocator, Op::Count>/8021/3056
bm<uint64_t, not_highly_aligned_allocator, Op::Count>/63/62
bm<uint64_t, not_highly_aligned_allocator, Op::Count>/31/30
bm<uint64_t, not_highly_aligned_allocator, Op::Count>/15/14
bm<uint64_t, not_highly_aligned_allocator, Op::Count>/7/6
🥇 Results observation
StringRFind
cases are improved greatlyStringFind
cases may need improvementFindUnsized
with aligned allocator♾️
FindUnsized
results explanationTL;DR: There's interesting results, but unfortunately not useful.
The
FindUnsized
withhighly_aligned_allocator
shows surprisingly small timings. Apparently there's an optimization inmemchr
similar to the reverted unsized find vectorization, that reads beyond the valid range.Looks like it only reads after the valid range, but does not do aligning read before the valid range, so it requires some alignment for the optimization to fully engage. The required alignment is 16 bytes, implying there's SSE inside, but not AVX. Should work well with default
malloc
16 bytes alignment.This doesn't seem to work this good for small sized range.
StringFind
currently usesmemchr
, and it doesn't show that good results.🔜 Further steps
I want to question, whether we want to also vectorize
basic_string::find
. Here are some points:FindSized
is twice faster thanStringFind
. Looks likememchr
doesn't use AVX. So we can stop callingmemchr
and use our vectorization. The counter point could be that the C runtime should be optimized instead.wmemchr
is slow. But it is expected to improve in a new Windows Kit.