You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if $n$ is odd, Strassen multiplication recurses with a block size of $(n-1)/2$ and then does some rank-1 corrections. When we recurse many times, the rank-1 updates may have to be applied at each level for unfavorable $n$ (i.e. with many trailing ones in the binary representation).
An alternative would be to write $n = b 2^d + r$ at the top level where $b$ is a chosen basecase block size, so that all Strassen blocks down from $n' = b 2^d$ split exactly. One then computes a rank $r$ correction.
Empirically, this always uses >= as many arithmetic operations as the current strategy (is there a nice proof of this?), usually by a very small amount, e.g. 1-3%. In practice, it could perform better since doing fewer, higher-rank operations generally reduces overheads and improves memory locality.
For reference, the following table shows the number of arithmetic operations for classical multiplication (C), Waksman multiplication (W), current Strassen S[m] with best casecase size m in brackets, and the alternative Strassen S2[m]. The final ratio is S2/S:
Note that there are also $n$ for which zero-padding should be more efficient than peeling, for example when $n$ is one less than a power of two for $n \ge 63$.
If we only count multiplications, and use Waksman instead of classical as the basecase for Strassen, then the alternative peeling strategy seems to be equal or better. Here are the brute forced multiplication counts:
For example, at $n = 102, 103$ this should give nearly a 10% speedup over a multiplication-constrained ring (zero-padding would be very slightly better yet for $n = 103$, but not for $n = 102$).
The text was updated successfully, but these errors were encountered:
Currently, if$n$ is odd, Strassen multiplication recurses with a block size of $(n-1)/2$ and then does some rank-1 corrections. When we recurse many times, the rank-1 updates may have to be applied at each level for unfavorable $n$ (i.e. with many trailing ones in the binary representation).
An alternative would be to write$n = b 2^d + r$ at the top level where $b$ is a chosen basecase block size, so that all Strassen blocks down from $n' = b 2^d$ split exactly. One then computes a rank $r$ correction.
Empirically, this always uses >= as many arithmetic operations as the current strategy (is there a nice proof of this?), usually by a very small amount, e.g. 1-3%. In practice, it could perform better since doing fewer, higher-rank operations generally reduces overheads and improves memory locality.
For reference, the following table shows the number of arithmetic operations for classical multiplication (C), Waksman multiplication (W), current Strassen S[m] with best casecase size m in brackets, and the alternative Strassen S2[m]. The final ratio is S2/S:
Note that there are also$n$ for which zero-padding should be more efficient than peeling, for example when $n$ is one less than a power of two for $n \ge 63$ .
If we only count multiplications, and use Waksman instead of classical as the basecase for Strassen, then the alternative peeling strategy seems to be equal or better. Here are the brute forced multiplication counts:
For example, at$n = 102, 103$ this should give nearly a 10% speedup over a multiplication-constrained ring (zero-padding would be very slightly better yet for $n = 103$ , but not for $n = 102$ ).
The text was updated successfully, but these errors were encountered: