Skip to content

Commit

Permalink
Detect all Apple M* CPUs and enable the wide multiplier assembly impl…
Browse files Browse the repository at this point in the history
…ementations (#1901)

### Description of changes: 
Looking at the benchmarks on https://github.com/ctz/graviola/ I noticed
AWS-LC was slower than expected. Those benchmarks were performed on an
M2 Mac and previously AWS-LC was only checking for M1 CPUs. This change
opts all Apple M* CPUs in the alt/wide multiplier implementations. A
future Apple M CPU might benefit from the non-alt implementation, but
for now it seems like all M CPUs benefit from the alt implementation and
this is a sane default.

### Call-outs:
I think this is safe because both implementations use the same
instructions, it's just a reordering. So if a future M CPU doesn't
support the wide multiplier it will still work it will just be slower
than if it was using the non-alt implementation.

### Testing:
Tested on an M1 and M3 that the alt implementation is picked and is
faster. M3 before:
```
Did 29631 EVP ECDH P-224 operations in 1018316us (29098.0 ops/sec)
Did 30000 EVP ECDH P-256 operations in 1020237us (29404.9 ops/sec)
Did 5840 EVP ECDH P-384 operations in 1000133us (5839.2 ops/sec)
Did 3630 EVP ECDH P-521 operations in 1096572us (3310.3 ops/sec)
Did 5430 EVP ECDH secp256k1 operations in 1000506us (5427.3 ops/sec)
Did 41000 EVP ECDH X25519 operations in 1009621us (40609.3 ops/sec)
```
M3 After
```
Did 30233 EVP ECDH P-224 operations in 1013635us (29826.3 ops/sec)
Did 31000 EVP ECDH P-256 operations in 1022186us (30327.2 ops/sec)
Did 7227 EVP ECDH P-384 operations in 1076336us (6714.4 ops/sec)
Did 4690 EVP ECDH P-521 operations in 1042402us (4499.2 ops/sec)
Did 5710 EVP ECDH secp256k1 operations in 1051837us (5428.6 ops/sec)
Did 52000 EVP ECDH X25519 operations in 1008149us (51579.7 ops/sec)
```

As expected P-224 and P-256 are unaffected, while 384/521/X25519 which
do use s2n-bignum are much faster.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
  • Loading branch information
andrewhop authored Oct 2, 2024
1 parent 4bb4eab commit ca5f197
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
4 changes: 2 additions & 2 deletions crypto/fipsmodule/cpucap/cpu_aarch64_apple.c
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,8 @@ void OPENSSL_cpuid_setup(void) {
OPENSSL_armcap_P |= ARMV8_SHA3;
}

if (is_brand("Apple M1")) {
OPENSSL_armcap_P |= ARMV8_APPLE_M1;
if (is_brand("Apple M")) {
OPENSSL_armcap_P |= ARMV8_APPLE_M;
}

if (has_hw_feature("hw.optional.arm.FEAT_DIT")) {
Expand Down
4 changes: 2 additions & 2 deletions crypto/fipsmodule/cpucap/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -243,13 +243,13 @@ OPENSSL_INLINE int CRYPTO_is_ARMv8_GCM_8x_capable(void) {
return ((OPENSSL_armcap_P & ARMV8_SHA3) != 0 &&
((OPENSSL_armcap_P & ARMV8_NEOVERSE_V1) != 0 ||
(OPENSSL_armcap_P & ARMV8_NEOVERSE_V2) != 0 ||
(OPENSSL_armcap_P & ARMV8_APPLE_M1) != 0));
(OPENSSL_armcap_P & ARMV8_APPLE_M) != 0));
}

OPENSSL_INLINE int CRYPTO_is_ARMv8_wide_multiplier_capable(void) {
return (OPENSSL_armcap_P & ARMV8_NEOVERSE_V1) != 0 ||
(OPENSSL_armcap_P & ARMV8_NEOVERSE_V2) != 0 ||
(OPENSSL_armcap_P & ARMV8_APPLE_M1) != 0;
(OPENSSL_armcap_P & ARMV8_APPLE_M) != 0;
}

OPENSSL_INLINE int CRYPTO_is_ARMv8_DIT_capable(void) {
Expand Down
2 changes: 1 addition & 1 deletion include/openssl/arm_arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
// high unrolling factor of AES-GCM and other algorithms that leverage a
// wide crypto pipeline and fast multiplier.
#define ARMV8_NEOVERSE_V1 (1 << 12)
#define ARMV8_APPLE_M1 (1 << 13)
#define ARMV8_APPLE_M (1 << 13)
#define ARMV8_NEOVERSE_V2 (1 << 14)

// ARMV8_DIT indicates support for the Data-Independent Timing (DIT) flag.
Expand Down

0 comments on commit ca5f197

Please sign in to comment.