-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Memcmp SIMD for arm64 NEON and SVE #8764
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8764 +/- ##
==========================================
+ Coverage 82.12% 82.14% +0.02%
==========================================
Files 975 975
Lines 271724 271736 +12
==========================================
+ Hits 223151 223222 +71
+ Misses 48573 48514 -59
Flags with carried forward coverage won't be shown. Click here to find out more. |
Could you also update the E.g.
Did you test this with Formatting will need to be fixed. |
Additionally, I would love to see to benchmarks on what the effect of using the SIMD is here. |
Thanks for looking @victorjulien
Done
A76 has NEON, but doesn't have SVE
It's obviously input dependent. If the strings are always going to fail on the first character the scalar implementation is great while the SIMD implementation has done extra work for 15 bytes. I'm partly making the assumption that this was a worthwhile optimization for SSE/AVX so it should be for NEON since i don't have a good set of data to throw at the problem. I have created a little micro-benchmark of 20 different strings some that fail early, some late, and some matches. This is part of the reason you don't see a SCMemcmpLowercase SVE implementation, as i can't make that faster than the NEON one even the the code is far more elegant. On my tests the NEON implementation of SCMemcmpLowercase s about 40% faster. The NEON implementation of SCMemcmp is just memcmp() since that is what we'd do anyway and the SVE SCMemcmp impl is around 10% faster. |
victorjulien@ anything else you’d like to see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments inline about tests
thanks for the ping. I'll improve them after the holidays and resubmit. |
f2dcbee
to
b9badbf
Compare
NOTE: This PR may contain new authors. |
Test multiple lengths in each test Many of the inputs are too short to take SIMD code paths
NOTE: This PR may contain new authors. |
Could you please resubmit as a new rebased PR ? This is Suricata's Github workflow cf https://docs.suricata.io/en/latest/devguide/contributing/code-submission-process.html#pull-requests |
Closing as stale, please feel free to reopen a new rebased PR with the suggestions. |
#define UPPER_HIGH 0x5A /* "Z" */ | ||
#define UPPER_DELTA 0x20 | ||
|
||
static inline int SCMemcmpLowercase(const void *s1, const void *s2, size_t len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this only uses NEON instructions, so it wouldn't see SVE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other branch of the #if
uses SVE, this is pure NEON.
Make sure these boxes are signed before submitting your Pull Request -- thank you.
A request has been sent to add me to our corporate contribution agreemen
NA
Link to redmine ticket:
Describe changes:
Provide values to any of the below to override the defaults.
To use a pull request use a branch name like
pr/N
whereN
is the pull request number.