Add timer for outer RK loop for SSPRK33 #1686

DanielDoehring · 2023-10-23T06:57:23Z

To be consistent with the other RK implementations we have, i.e.,

Trixi.jl/src/time_integration/methods_2N.jl

Line 142 in c2f0095

@trixi_timeit timer() "main loop" while !integrator.finalstep

Trixi.jl/src/time_integration/methods_3Sstar.jl

Line 212 in c2f0095

@trixi_timeit timer() "main loop" while !integrator.finalstep

github-actions · 2023-10-23T07:04:22Z

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

The PR has a single goal that is clear from the PR title and/or description.
All code changes represent a single set of modifications that logically belong together.
No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

The code can be understood easily.
Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
There are no redundancies that can be removed by simple modularization/refactoring.
There are no leftover debug statements or commented code sections.
The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

New functions and types are documented with a docstring or top-level comment.
Relevant publications are referenced in docstrings (see example for formatting).
Inline comments are used to document longer or unusual code sections.
Comments describe intent ("why?") and not just functionality ("what?").
If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

The PR passes all tests.
New or modified lines of code are covered by tests.
New or modified tests run in less then 10 seconds.

Performance

There are no type instabilities or memory allocations in performance-critical parts.
If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

The correctness of the code was verified using appropriate tests.
If new equations/methods are added, a convergence test has been run and the results
are posted in the PR.

Created with ❤️ by the Trixi.jl community.

ranocha

Thanks

DanielDoehring · 2023-10-23T07:42:34Z

Tests will hopefully pass if #1684 is merged.

codecov · 2023-10-23T08:20:36Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c2f0095) 82.63% compared to head (dc20d6d) 83.37%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1686      +/-   ##
==========================================
+ Coverage   82.63%   83.37%   +0.73%     
==========================================
  Files         425      425              
  Lines       34376    34395      +19     
==========================================
+ Hits        28406    28674     +268     
+ Misses       5970     5721     -249

Flag	Coverage Δ
unittests	`83.37% <0.00%> (+0.73%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/time_integration/methods_SSP.jl	`0.00% <0.00%> (ø)`

... and 147 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…o Time_SSPRK33

bennibolm · 2023-10-24T09:14:10Z

First, thanks for the work @DanielDoehring. I have an open question about these high allocations for the subcell limiting simulations. If I reproduce this issue correctly on my machine, "normal" elixirs also seem to have high allocation in the first run of a julia session. But for the next runs and tests of allocations they are on a normal level again.

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
5446200
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
816

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
816

For the subcell limiting simulations, after every simulation two allocation tests are needed to decrease the allocations.

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing_subcell.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
5485708
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
240

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing_subcell.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t) # --> Failing tests with allocation upper bound of 1000
10544
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
240

Can this be a strange behavior of this test, or are the allocations of subcell limiting simulations just higher? With the latter, I would not understand why so much lower allocations are shown on the second attempt.

In general, I understand that my simulation has more allocations since I need memory for the antidiffusive flux and all coefficients (and in this case the simulation needs about 4 times the time steps). But does this justify so much higher allocations? @ranocha

And are we all fine with just increasing the maximum allowed number of allocations for subcell limiting simulations from 1,000 to 15,000?

DanielDoehring · 2023-10-24T09:34:43Z

The additional allocations are due to the usage of the custom integrator. There seems still to be some type instability there. Compare for instance https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_1d_dgsem/elixir_hypdiff_nonperiodic.jl which uses CarpenterKennedy2N54 from OrdinaryDiffEq with summary cb

 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              763ms /  44.8%           45.0MiB /  24.5%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 I/O                           14    234ms   68.5%  16.7ms   7.00MiB   63.6%   512KiB
   ~I/O~                       14    148ms   43.2%  10.5ms   0.98MiB    8.9%  71.6KiB
   save solution               13   86.5ms   25.3%  6.65ms   6.03MiB   54.7%   475KiB
   save mesh                   13   2.07μs    0.0%   159ns     0.00B    0.0%    0.00B
   get element variables       13   1.44μs    0.0%   111ns     0.00B    0.0%    0.00B
   get node variables          13    242ns    0.0%  18.6ns     0.00B    0.0%    0.00B
 analyze solution              13    101ms   29.7%  7.80ms   4.01MiB   36.4%   316KiB
 rhs!                       5.56k   6.20ms    1.8%  1.12μs   6.61KiB    0.1%    1.22B
   ~rhs!~                   5.56k   2.26ms    0.7%   407ns   6.61KiB    0.1%    1.22B
   source terms             5.56k   1.59ms    0.5%   285ns     0.00B    0.0%    0.00B
   volume integral          5.56k    758μs    0.2%   136ns     0.00B    0.0%    0.00B
   boundary flux            5.56k    368μs    0.1%  66.1ns     0.00B    0.0%    0.00B
   interface flux           5.56k    270μs    0.1%  48.6ns     0.00B    0.0%    0.00B
   prolong2interfaces       5.56k    266μs    0.1%  47.8ns     0.00B    0.0%    0.00B
   surface integral         5.56k    232μs    0.1%  41.7ns     0.00B    0.0%    0.00B
   prolong2boundaries       5.56k    170μs    0.0%  30.5ns     0.00B    0.0%    0.00B
   reset ∂u/∂t              5.56k    149μs    0.0%  26.7ns     0.00B    0.0%    0.00B
   Jacobian                 5.56k    137μs    0.0%  24.7ns     0.00B    0.0%    0.00B
 calculate dt               1.11k   42.2μs    0.0%  37.9ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

without any measurement of the overall outer main loop. This is also used in elixir_euler_shockcapturing.jl.
In contrast, for our own custom integrators we have this additional measurement, see also
https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_1d_dgsem/elixir_hypdiff_harmonic_nonperiodic.jl
with summary cb

 ──────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                       Time                    Allocations      
                                     ───────────────────────   ────────────────────────
          Tot / % measured:               430ms /  37.4%           14.7MiB /  59.7%    

 Section                     ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
 I/O                              2    155ms   96.5%  77.5ms   8.43MiB   96.4%  4.21MiB
   ~I/O~                          2    146ms   91.1%  73.1ms   8.42MiB   96.3%  4.21MiB
   save solution                  1   8.71ms    5.4%  8.71ms   7.30KiB    0.1%  7.30KiB
   get element variables          1    680ns    0.0%   680ns     0.00B    0.0%    0.00B
   save mesh                      1   30.0ns    0.0%  30.0ns     0.00B    0.0%    0.00B
   get node variables             1   21.0ns    0.0%  21.0ns     0.00B    0.0%    0.00B
 main loop                        1   5.50ms    3.4%  5.50ms    304KiB    3.4%   304KiB
   I/O                            6   2.07ms    1.3%   346μs   54.3KiB    0.6%  9.04KiB
     save solution                6   2.07ms    1.3%   344μs   43.8KiB    0.5%  7.30KiB
     ~I/O~                        6   8.39μs    0.0%  1.40μs   10.4KiB    0.1%  1.74KiB
     save mesh                    6    308ns    0.0%  51.3ns     0.00B    0.0%    0.00B
     get element variables        6    268ns    0.0%  44.7ns     0.00B    0.0%    0.00B
     get node variables           6    110ns    0.0%  18.3ns     0.00B    0.0%    0.00B
   rhs!                       2.72k   1.74ms    1.1%   641ns   6.61KiB    0.1%    2.49B
     ~rhs!~                   2.72k    979μs    0.6%   360ns   6.61KiB    0.1%    2.49B
     volume integral          2.72k    152μs    0.1%  55.8ns     0.00B    0.0%    0.00B
     boundary flux            2.72k    105μs    0.1%  38.5ns     0.00B    0.0%    0.00B
     source terms             2.72k   87.7μs    0.1%  32.2ns     0.00B    0.0%    0.00B
     interface flux           2.72k   80.1μs    0.0%  29.5ns     0.00B    0.0%    0.00B
     prolong2interfaces       2.72k   78.1μs    0.0%  28.7ns     0.00B    0.0%    0.00B
     surface integral         2.72k   72.2μs    0.0%  26.5ns     0.00B    0.0%    0.00B
     prolong2boundaries       2.72k   67.6μs    0.0%  24.8ns     0.00B    0.0%    0.00B
     reset ∂u/∂t              2.72k   65.1μs    0.0%  23.9ns     0.00B    0.0%    0.00B
     Jacobian                 2.72k   56.4μs    0.0%  20.7ns     0.00B    0.0%    0.00B
   ~main loop~                    1    971μs    0.6%   971μs    149KiB    1.7%   149KiB
   analyze solution               6    541μs    0.3%  90.2μs   94.2KiB    1.1%  15.7KiB
   Runge-Kutta step           2.72k    159μs    0.1%  58.5ns     0.00B    0.0%    0.00B
   calculate dt                 544   15.5μs    0.0%  28.5ns     0.00B    0.0%    0.00B
 analyze solution                 1    145μs    0.1%   145μs   15.8KiB    0.2%  15.8KiB
 calculate dt                     1    479ns    0.0%   479ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────

thus we measure actually also the callback business among others.
This is in contrast to the integrators from OrdinaryDiffEq which are somewhat blind - this is also the reason why I had to increase the allowed allocations in this PR after adding the measurement.
Most likely, those allocations also occur for the methods from OrdinaryDiffEq, they are simply not reported.

bennibolm · 2023-10-24T09:46:25Z

[...] which uses CarpenterKennedy2N54 from OrdinaryDiffEq with summary cb [...] without any measurement of the overall outer main loop.

I think that was actually the reason why I deleted the timer main loop at that time:sweat_smile:

Most likely, those allocations also occur for the methods from OrdinaryDiffEq, they are simply not reported.

Okay, thanks:+1:

DanielDoehring · 2023-10-24T09:55:52Z

I guess for comparison this makes sense (especially if you want to compare your SSPRK33 to the one of OrdinaryDiffEq) but I found it quite helpful in reporting this main loop timer since it would also detect e.g. allocations in the RK scheme, e.g. if you would have e.g. abstract float parameters of the method which would get promoted to float all the time (a completely made-up example that certainly has not and would not happen to me ... )

Add timer for outer RK loop

0d12203

DanielDoehring requested a review from bennibolm October 23, 2023 06:57

Merge branch 'main' into Time_SSPRK33

dc20d6d

ranocha previously approved these changes Oct 23, 2023

View reviewed changes

ranocha enabled auto-merge (squash) October 23, 2023 07:30

bennibolm approved these changes Oct 23, 2023

View reviewed changes

DanielDoehring added the consistency Make Michael happy label Oct 23, 2023

Merge branch 'main' into Time_SSPRK33

0f68ec0

ranocha disabled auto-merge October 23, 2023 15:00

ranocha enabled auto-merge (squash) October 23, 2023 15:00

DanielDoehring added 2 commits October 23, 2023 17:46

Increase alloc check

f3fadd4

Merge branch 'Time_SSPRK33' of github.com:DanielDoehring/Trixi.jl int…

3a1912f

…o Time_SSPRK33

auto-merge was automatically disabled October 23, 2023 15:46
Head branch was pushed to by a user without write access

DanielDoehring dismissed ranocha’s stale review via 3a1912f October 23, 2023 15:46

ranocha merged commit e900262 into trixi-framework:main Oct 24, 2023
31 checks passed

DanielDoehring deleted the Time_SSPRK33 branch October 24, 2023 09:37

bennibolm mentioned this pull request Oct 24, 2023

Implement subcell limiting for non-conservative systems #1670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timer for outer RK loop for SSPRK33 #1686

Add timer for outer RK loop for SSPRK33 #1686

DanielDoehring commented Oct 23, 2023

github-actions bot commented Oct 23, 2023

ranocha left a comment

DanielDoehring commented Oct 23, 2023

codecov bot commented Oct 23, 2023 •

edited

Loading

bennibolm commented Oct 24, 2023

DanielDoehring commented Oct 24, 2023

bennibolm commented Oct 24, 2023

DanielDoehring commented Oct 24, 2023 •

edited

Loading

Add timer for outer RK loop for SSPRK33 #1686

Add timer for outer RK loop for SSPRK33 #1686

Conversation

DanielDoehring commented Oct 23, 2023

github-actions bot commented Oct 23, 2023

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

ranocha left a comment

Choose a reason for hiding this comment

DanielDoehring commented Oct 23, 2023

codecov bot commented Oct 23, 2023 • edited Loading

Codecov Report

bennibolm commented Oct 24, 2023

DanielDoehring commented Oct 24, 2023

bennibolm commented Oct 24, 2023

DanielDoehring commented Oct 24, 2023 • edited Loading

codecov bot commented Oct 23, 2023 •

edited

Loading

DanielDoehring commented Oct 24, 2023 •

edited

Loading