Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complicated lazy broadcasting slower than equivalent single broadcast #56629

Open
mcabbott opened this issue Nov 21, 2024 · 2 comments
Open

Complicated lazy broadcasting slower than equivalent single broadcast #56629

mcabbott opened this issue Nov 21, 2024 · 2 comments
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@mcabbott
Copy link
Contributor

This example from Discourse shows a slowdown when broadcasting a moderately complicated expression, instead of broadcasting a function containing the same expression:

arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B - A * B - A / B + A * B + A / B - A * B - A / B

let N = 151
    A, B, C1, C2 = (rand(N,N,N).+1 for _ in 1:4)
    @btime arrayfun!($C1, $A, $B)
    @btime $C2 .= scalarfun.($A, $B)
    C1  C2
end
#  17.306 ms (11 allocations: 352 bytes)
#   5.900 ms (0 allocations: 0 bytes)

The effect seems fairly robust, it's not particular to 3D arrays, nor to A^2.
Replacing @. with .+ etc. helps a bit (which according to #29120 removes n-ary +, here n<=4):

arrayfun!(C, A, B) = C .= A.^2 .+ B.^2 .+ A .* B .+ A ./ B .- A .* B .- A ./ B .+ A .* B .+ A ./ B .- A .* B .- A ./ B
#  17.345 ms (0 allocations: 0 bytes)

Simpler expressions also have the slowdown but no allocation:

arrayfun!(C, A, B) = @. C = A^2 + B^2 + A * B + A / B
scalarfun(A::Real, B::Real) = A^2 + B^2 + A * B + A / B
#  3.148 ms (0 allocations: 0 bytes)
#  971.000 μs (0 allocations: 0 bytes)

Even simpler expressions like arrayfun!(C, A, B) = @. C = A^2 + B^2 show no slowdown at all.

@mcabbott mcabbott added performance Must go faster broadcast Applying a function over a collection labels Nov 21, 2024
@roflmaostc
Copy link
Contributor

This is quite a severe penalty, isn't it?

A lot of my code uses broadcast expressions with at least >5 dots. Replacing all of them with function calls is very unpractical.

@mcabbott
Copy link
Contributor Author

A macro could in principle produce the scalarfun form for you. Building this into @. would be a bit scary (seems likely to expose all kinds of special assumptions in code). But the question before contemplating that is: Why can't the complicated form compile down to the same code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants