How to handle long compile times? #153

DrChainsaw · 2023-08-02T14:17:11Z

Is there some known remedy for explosive compile times when updating deeply nested models?

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 1 on 8 virtual cores

(@v1.9) pkg> activate --temp

(jl_pfjmCB) pkg> add Flux, Optimisers

using Flux, Optimisers

for i in 1:2:20
    ts = @timed c = foldl((m,_) -> Parallel(+, Chain(m, Dense(1=>1)), m),1:i; init=Dense(1=>1))
    @info "$i Create chain: $(ts.time)"
    ts = @timed os = Optimisers.setup(Optimisers.Adam(0.1f0), c)
    @info " Setup opt: $(ts.time)"
    ts = @timed gs = gradient((f, x) -> sum(f(x)), c, ones(Float32, 1,1))
    @info " Calc grad: $(ts.time)"
    ts = @timed Optimisers.update(os, c, gs[1])
    @info " Update pars: $(ts.time)"
end
[ Info: 1 Create chain: 0.3122138
[ Info:  Setup opt: 0.4341211
[ Info:  Calc grad: 8.0373625
[ Info:  Update pars: 2.2199014
[ Info: 3 Create chain: 0.1156983
[ Info:  Setup opt: 0.2145368
[ Info:  Calc grad: 5.0142625
[ Info:  Update pars: 1.6858239
[ Info: 5 Create chain: 0.1646994
[ Info:  Setup opt: 0.3382421
[ Info:  Calc grad: 22.8807554
[ Info:  Update pars: 14.1957323
[ Info: 7 Create chain: 0.8384293
[ Info:  Setup opt: 1.7405321
[ Info:  Calc grad: 33.0993626
[ Info:  Update pars: 1518.808826
[ Info: 9 Create chain: 4.0898057
[ Info:  Setup opt: 8.6561113
[ Info:  Calc grad: 121.4887817
## This one is still not finished 19 hours later :/

I did this to prevent spurious stalls with NaiveGAflux, but maybe there is a better way.

mcabbott · 2023-08-02T14:51:36Z

One thing we could try is adding @nospecialize to some update! methods? Or even to a whole block of its code.

DrChainsaw · 2023-08-03T09:21:34Z

Yeah, I guess that is the simplest option. Won't it hurt the SciML μs-hunting cases?

CarloLucibello · 2023-08-20T08:06:11Z

maybe @ChrisRackauckas has some advice here

ChrisRackauckas · 2023-08-21T17:23:26Z

Make a flamegraph of the compile times on a standard case to see what actually matters. See for example SciML/DifferentialEquations.jl#786 (comment) . That will tell you what is taking all of the compile time and should be focused on.

Removing specialization doesn't necessarily make compilation faster. Sometimes, the fact that it can remove inference at a level can actually make it worse, so check and see if it's inference time or llvm time.

Won't it hurt the SciML μs-hunting cases?

That's fine. If we need something separate we can do SimpleOptimization.jl. We already have SimpleDiffEq.jl and SimpleNonlienarSolve.jl which are specailized implementation for things like GPU kernels and ultra-small problems on the same interface. If we need a SimpleOptimization.jl for a SimpleGradientDecent and SimpleAdam then I wouldn't be too surprised. So if you go and make Optimisers the best thing for neural networks we won't mind and we can just make sure Optimization.jl has multiple options with the right trade-offs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle long compile times? #153

How to handle long compile times? #153

DrChainsaw commented Aug 2, 2023 •

edited

Loading

mcabbott commented Aug 2, 2023

DrChainsaw commented Aug 3, 2023

CarloLucibello commented Aug 20, 2023

ChrisRackauckas commented Aug 21, 2023

How to handle long compile times? #153

How to handle long compile times? #153

Comments

DrChainsaw commented Aug 2, 2023 • edited Loading

mcabbott commented Aug 2, 2023

DrChainsaw commented Aug 3, 2023

CarloLucibello commented Aug 20, 2023

ChrisRackauckas commented Aug 21, 2023

DrChainsaw commented Aug 2, 2023 •

edited

Loading