Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster FMA on Haswell #216

Merged
merged 5 commits into from
Oct 27, 2024
Merged

Faster FMA on Haswell #216

merged 5 commits into from
Oct 27, 2024

Conversation

ashvardanian
Copy link
Owner

Handling loads and stores with SIMD is tricky. Not because of up-casting, but the down-casting at the end of the loop. In AVX2 it's a drag! We keep it for another day and use AVX2 for the actual math and value clipping. The current variant operates at 15-19 GB/s as opposed to under 500 MB/s for serial code.

------------------------------------------------------------------------------------------------------------
Benchmark                                                  Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------
fma_u8_haswell<1536d>/min_time:10.000/threads:1          248 ns          248 ns     56523758 abs_delta=8.20566 bytes=18.6111G/s pairs=4.03886M/s relative_error=2.16737m
wsum_u8_haswell<1536d>/min_time:10.000/threads:1         197 ns          197 ns     71164289 abs_delta=7.76442 bytes=15.5983G/s pairs=5.07757M/s relative_error=2.86599m
fma_u8_sapphire<1536d>/min_time:10.000/threads:1        70.9 ns         70.9 ns    197581878 abs_delta=9.2812 bytes=64.9908G/s pairs=14.1039M/s relative_error=2.45142m
wsum_u8_sapphire<1536d>/min_time:10.000/threads:1       51.2 ns         51.2 ns    275604255 abs_delta=8.89144 bytes=60.0323G/s pairs=19.5418M/s relative_error=3.28203m
fma_u8_serial<1536d>/min_time:10.000/threads:1          9749 ns         9748 ns      1428411 abs_delta=1.66854 bytes=472.69M/s pairs=102.58k/s relative_error=440.882u
wsum_u8_serial<1536d>/min_time:10.000/threads:1         9455 ns         9455 ns      1488320 abs_delta=2.32787 bytes=324.901M/s pairs=105.762k/s relative_error=859.403u

@ashvardanian ashvardanian merged commit ebd0537 into main Oct 27, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant