A few notes on DGEMM vs. OpenBLAS #159

tylerjereddy · 2023-01-16T18:55:13Z

We wanted to get an idea of what might be different between DGEMM from gh-146, which is brute force "for loops," and the faster-performing version in OpenBLAS.

From a quick scan of the OpenBLAS code base at the time of writing, there are a few relevant things to note:

OpenBLAS uses hand crafted assembly for many architectures and algorithms, for example kernel/x86_64/dgemm_kernel_4x8_haswell.S has 5000 lines of assembly for DGEMM stuff--so that's obviously something that might be optimized at a different level of "tuning" vs. us at the moment; I'm not sure how easy it will be for me to read the assembly and check for specific types of optimizations like pipelining and so on..
they also seem to be able to leverage i.e., cuda_dgemm_kernel so it may make sense to compare with them on the GPU with a specific compilation of OpenBLAS for that scenario (though this is less convenient for the current benchmarks, because SciPy is not GPU-swappable off the shelf; could look at CuPy CuBLAS or some other Python interface for comparison maybe)
git grep -E -i "strassen" returns no results in OpenBLAS, so the usage of an algorithm with a fundamentally different asymptotic behavior is perhaps a bit less likely; also, see related discussion agreeing with this: https://stackoverflow.com/a/11421344/2942522
- in short, it seems like Strassen may have substantial algorithm coefficients/caching issues and maybe even numerical stability issues that prevent it from being the primary choice despite the asymptotic advantages (this may also be why I don't think it is even mentioned in the IEEE paper we were looking at..)

Not sure how helpful all of this is, but my initial impression is that low-level optimizations in assembly for specific architectures drive a lot of the improvements, rather than fancier/asymptotically-superior algorithms that are far more complex.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few notes on DGEMM vs. OpenBLAS #159

A few notes on DGEMM vs. OpenBLAS #159

tylerjereddy commented Jan 16, 2023

A few notes on DGEMM vs. OpenBLAS #159

A few notes on DGEMM vs. OpenBLAS #159

Comments

tylerjereddy commented Jan 16, 2023