Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Memcmp SIMD for arm64 NEON and SVE #8764

Closed
wants to merge 3 commits into from
Closed

Conversation

AGSaidi
Copy link

@AGSaidi AGSaidi commented Apr 25, 2023

Make sure these boxes are signed before submitting your Pull Request -- thank you.

Link to redmine ticket:

Describe changes:

  • lengthen memcmp test input length to exercise simd instructions
  • add timing assembly for arm64
  • implement Memcmp SIMD for arm64 NEON and SVE

Provide values to any of the below to override the defaults.

To use a pull request use a branch name like pr/N where N is the pull request number.

SV_REPO=
SV_BRANCH=
SU_REPO=
SU_BRANCH=
LIBHTP_REPO=
LIBHTP_BRANCH=

@AGSaidi AGSaidi requested a review from victorjulien as a code owner April 25, 2023 20:38
@codecov
Copy link

codecov bot commented Apr 26, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6896a93) 82.12% compared to head (4158a9b) 82.14%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8764      +/-   ##
==========================================
+ Coverage   82.12%   82.14%   +0.02%     
==========================================
  Files         975      975              
  Lines      271724   271736      +12     
==========================================
+ Hits       223151   223222      +71     
+ Misses      48573    48514      -59     
Flag Coverage Δ
fuzzcorpus 62.79% <ø> (+0.07%) ⬆️
suricata-verify 61.41% <ø> (-0.01%) ⬇️
unittests 62.84% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@victorjulien
Copy link
Member

victorjulien commented Apr 26, 2023

Could you also update the SIMD support line in --build-info?

E.g.

./src/suricata --build-info|grep SSE
SIMD support: SSE_4_2 SSE_4_1 SSE_3

Did you test this with suricata-verify? We have no arm64 running in our public CI, and I think my personal arm64 devices have no NEON/SVE, although I'm not quite sure. My most modern runner (rock 5b) has A76 cores. Not sure if that is compatible.

Formatting will need to be fixed.

@victorjulien
Copy link
Member

Additionally, I would love to see to benchmarks on what the effect of using the SIMD is here.

@AGSaidi
Copy link
Author

AGSaidi commented Apr 26, 2023

Thanks for looking @victorjulien

Did you test this with suricata-verify?

PASSED:  1200
FAILED:  0
SKIPPED: 68

Formatting will need to be fixed.

Done

I think my personal arm64 devices have no NEON/SVE

A76 has NEON, but doesn't have SVE

Could you also update the SIMD support line in --build-info?

./src/suricata --build-info | grep SIMD
SIMD support: NEON SVE

Performance

It's obviously input dependent. If the strings are always going to fail on the first character the scalar implementation is great while the SIMD implementation has done extra work for 15 bytes. I'm partly making the assumption that this was a worthwhile optimization for SSE/AVX so it should be for NEON since i don't have a good set of data to throw at the problem. I have created a little micro-benchmark of 20 different strings some that fail early, some late, and some matches. This is part of the reason you don't see a SCMemcmpLowercase SVE implementation, as i can't make that faster than the NEON one even the the code is far more elegant. On my tests the NEON implementation of SCMemcmpLowercase s about 40% faster. The NEON implementation of SCMemcmp is just memcmp() since that is what we'd do anyway and the SVE SCMemcmp impl is around 10% faster.

@AGSaidi
Copy link
Author

AGSaidi commented May 5, 2023

victorjulien@ anything else you’d like to see?

@victorjulien victorjulien added this to the 8.0 milestone Jul 11, 2023
src/util-memcmp.c Outdated Show resolved Hide resolved
Copy link
Contributor

@catenacyber catenacyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments inline about tests

@AGSaidi
Copy link
Author

AGSaidi commented Dec 21, 2023

thanks for the ping. I'll improve them after the holidays and resubmit.

@AGSaidi AGSaidi force-pushed the neon branch 3 times, most recently from f2dcbee to b9badbf Compare January 15, 2024 18:56
Copy link

NOTE: This PR may contain new authors.

Copy link

NOTE: This PR may contain new authors.

@catenacyber
Copy link
Contributor

Could you please resubmit as a new rebased PR ? This is Suricata's Github workflow cf https://docs.suricata.io/en/latest/devguide/contributing/code-submission-process.html#pull-requests

@catenacyber
Copy link
Contributor

Closing as stale, please feel free to reopen a new rebased PR with the suggestions.

#define UPPER_HIGH 0x5A /* "Z" */
#define UPPER_DELTA 0x20

static inline int SCMemcmpLowercase(const void *s1, const void *s2, size_t len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this only uses NEON instructions, so it wouldn't see SVE?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other branch of the #if uses SVE, this is pure NEON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants