Replies: 1 comment 20 replies
-
That's indeed very interesting! 128 bytes is the size at which the algorithm starts to proceed with an "high ILP construction" for greater throughput. It seems to underperform on this hardware compared to the <128 bytes processing, I'd be curious to understand why but it might be difficult to put my hands on such hardware. |
Beta Was this translation helpful? Give feedback.
20 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Just thought I'd share...
Ran the tests multiple times to make sure this was not an accident.
Beta Was this translation helpful? Give feedback.
All reactions