Implement Memcmp SIMD for arm64 NEON and SVE #8764

AGSaidi · 2023-04-25T20:38:16Z

Make sure these boxes are signed before submitting your Pull Request -- thank you.

I have read the contributing guide lines at https://suricata.readthedocs.io/en/latest/devguide/codebase/contributing/contribution-process.html
I have signed the Open Information Security Foundation contribution agreement at https://suricata.io/about/contribution-agreement/
A request has been sent to add me to our corporate contribution agreemen
I have updated the user guide (in doc/userguide/) to reflect the changes made (if applicable)
NA

Link to redmine ticket:

Describe changes:

lengthen memcmp test input length to exercise simd instructions
add timing assembly for arm64
implement Memcmp SIMD for arm64 NEON and SVE

Provide values to any of the below to override the defaults.

To use a pull request use a branch name like pr/N where N is the pull request number.

SV_REPO=
SV_BRANCH=
SU_REPO=
SU_BRANCH=
LIBHTP_REPO=
LIBHTP_BRANCH=

codecov · 2023-04-26T05:01:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6896a93) 82.12% compared to head (4158a9b) 82.14%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8764      +/-   ##
==========================================
+ Coverage   82.12%   82.14%   +0.02%     
==========================================
  Files         975      975              
  Lines      271724   271736      +12     
==========================================
+ Hits       223151   223222      +71     
+ Misses      48573    48514      -59

Flag	Coverage Δ
fuzzcorpus	`62.79% <ø> (+0.07%)`	⬆️
suricata-verify	`61.41% <ø> (-0.01%)`	⬇️
unittests	`62.84% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

victorjulien · 2023-04-26T05:58:14Z

Could you also update the SIMD support line in --build-info?

E.g.

./src/suricata --build-info|grep SSE
SIMD support: SSE_4_2 SSE_4_1 SSE_3

Did you test this with suricata-verify? We have no arm64 running in our public CI, and I think my personal arm64 devices have no NEON/SVE, although I'm not quite sure. My most modern runner (rock 5b) has A76 cores. Not sure if that is compatible.

Formatting will need to be fixed.

victorjulien · 2023-04-26T06:01:28Z

Additionally, I would love to see to benchmarks on what the effect of using the SIMD is here.

AGSaidi · 2023-04-26T15:48:35Z

Thanks for looking @victorjulien

Did you test this with suricata-verify?

PASSED:  1200
FAILED:  0
SKIPPED: 68

Formatting will need to be fixed.

Done

I think my personal arm64 devices have no NEON/SVE

A76 has NEON, but doesn't have SVE

Could you also update the SIMD support line in --build-info?

./src/suricata --build-info | grep SIMD
SIMD support: NEON SVE

Performance

It's obviously input dependent. If the strings are always going to fail on the first character the scalar implementation is great while the SIMD implementation has done extra work for 15 bytes. I'm partly making the assumption that this was a worthwhile optimization for SSE/AVX so it should be for NEON since i don't have a good set of data to throw at the problem. I have created a little micro-benchmark of 20 different strings some that fail early, some late, and some matches. This is part of the reason you don't see a SCMemcmpLowercase SVE implementation, as i can't make that faster than the NEON one even the the code is far more elegant. On my tests the NEON implementation of SCMemcmpLowercase s about 40% faster. The NEON implementation of SCMemcmp is just memcmp() since that is what we'd do anyway and the SVE SCMemcmp impl is around 10% faster.

AGSaidi · 2023-05-05T00:12:16Z

victorjulien@ anything else you’d like to see?

src/util-memcmp.c

catenacyber

see comments inline about tests

AGSaidi · 2023-12-21T20:27:02Z

thanks for the ping. I'll improve them after the holidays and resubmit.

github-actions · 2024-01-18T10:23:37Z

NOTE: This PR may contain new authors.

Test multiple lengths in each test Many of the inputs are too short to take SIMD code paths

github-actions · 2024-01-18T20:56:44Z

NOTE: This PR may contain new authors.

catenacyber · 2024-03-01T21:24:44Z

Could you please resubmit as a new rebased PR ? This is Suricata's Github workflow cf https://docs.suricata.io/en/latest/devguide/contributing/code-submission-process.html#pull-requests

catenacyber · 2024-05-14T12:53:52Z

Closing as stale, please feel free to reopen a new rebased PR with the suggestions.

victorjulien · 2024-09-02T16:17:12Z

src/util-memcmp.h

+#define UPPER_HIGH  0x5A /* "Z" */
+#define UPPER_DELTA 0x20
+
+static inline int SCMemcmpLowercase(const void *s1, const void *s2, size_t len)


it looks like this only uses NEON instructions, so it wouldn't see SVE?

the other branch of the #if uses SVE, this is pure NEON.

AGSaidi requested a review from victorjulien as a code owner April 25, 2023 20:38

AGSaidi force-pushed the neon branch from 789b794 to cf1bd88 Compare April 26, 2023 15:48

catenacyber assigned victorjulien May 18, 2023

victorjulien added this to the 8.0 milestone Jul 11, 2023

catenacyber reviewed Aug 30, 2023

View reviewed changes

src/util-memcmp.c Outdated Show resolved Hide resolved

catenacyber requested changes Dec 21, 2023

View reviewed changes

AGSaidi force-pushed the neon branch 3 times, most recently from f2dcbee to b9badbf Compare January 15, 2024 18:56

AGSaidi added 3 commits January 18, 2024 09:05

tests: lengthen memcmp test input length

b9abeeb

Test multiple lengths in each test Many of the inputs are too short to take SIMD code paths

util: add timing assembly for arm64

b621b11

util: implement Memcmp SIMD for arm64 NEON and SVE

4158a9b

AGSaidi force-pushed the neon branch from b9badbf to 4158a9b Compare January 18, 2024 15:05

catenacyber closed this May 14, 2024

victorjulien reviewed Sep 2, 2024

View reviewed changes

AGSaidi mentioned this pull request Sep 6, 2024

Implement Memcmp SIMD for arm64 NEON and SVE #11725

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Memcmp SIMD for arm64 NEON and SVE #8764

Implement Memcmp SIMD for arm64 NEON and SVE #8764

AGSaidi commented Apr 25, 2023

codecov bot commented Apr 26, 2023 •

edited

Loading

victorjulien commented Apr 26, 2023 •

edited

Loading

victorjulien commented Apr 26, 2023

AGSaidi commented Apr 26, 2023

AGSaidi commented May 5, 2023

catenacyber left a comment

AGSaidi commented Dec 21, 2023

github-actions bot commented Jan 18, 2024

github-actions bot commented Jan 18, 2024

catenacyber commented Mar 1, 2024

catenacyber commented May 14, 2024

victorjulien Sep 2, 2024

AGSaidi Sep 6, 2024

Implement Memcmp SIMD for arm64 NEON and SVE #8764

Implement Memcmp SIMD for arm64 NEON and SVE #8764

Conversation

AGSaidi commented Apr 25, 2023

Provide values to any of the below to override the defaults.

codecov bot commented Apr 26, 2023 • edited Loading

Codecov Report

victorjulien commented Apr 26, 2023 • edited Loading

victorjulien commented Apr 26, 2023

AGSaidi commented Apr 26, 2023

AGSaidi commented May 5, 2023

catenacyber left a comment

Choose a reason for hiding this comment

AGSaidi commented Dec 21, 2023

github-actions bot commented Jan 18, 2024

github-actions bot commented Jan 18, 2024

catenacyber commented Mar 1, 2024

catenacyber commented May 14, 2024

victorjulien Sep 2, 2024

Choose a reason for hiding this comment

AGSaidi Sep 6, 2024

Choose a reason for hiding this comment

codecov bot commented Apr 26, 2023 •

edited

Loading

victorjulien commented Apr 26, 2023 •

edited

Loading