Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timer for outer RK loop for SSPRK33 #1686

Merged
merged 5 commits into from
Oct 24, 2023

Conversation

DanielDoehring
Copy link
Contributor

To be consistent with the other RK implementations we have, i.e.,

@trixi_timeit timer() "main loop" while !integrator.finalstep

@trixi_timeit timer() "main loop" while !integrator.finalstep

@github-actions
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

ranocha
ranocha previously approved these changes Oct 23, 2023
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ranocha ranocha enabled auto-merge (squash) October 23, 2023 07:30
@DanielDoehring
Copy link
Contributor Author

Tests will hopefully pass if #1684 is merged.

@DanielDoehring DanielDoehring added the consistency Make Michael happy label Oct 23, 2023
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c2f0095) 82.63% compared to head (dc20d6d) 83.37%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1686      +/-   ##
==========================================
+ Coverage   82.63%   83.37%   +0.73%     
==========================================
  Files         425      425              
  Lines       34376    34395      +19     
==========================================
+ Hits        28406    28674     +268     
+ Misses       5970     5721     -249     
Flag Coverage Δ
unittests 83.37% <0.00%> (+0.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/time_integration/methods_SSP.jl 0.00% <0.00%> (ø)

... and 147 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ranocha ranocha disabled auto-merge October 23, 2023 15:00
@ranocha ranocha enabled auto-merge (squash) October 23, 2023 15:00
auto-merge was automatically disabled October 23, 2023 15:46

Head branch was pushed to by a user without write access

@ranocha ranocha merged commit e900262 into trixi-framework:main Oct 24, 2023
31 checks passed
@bennibolm
Copy link
Contributor

First, thanks for the work @DanielDoehring. I have an open question about these high allocations for the subcell limiting simulations. If I reproduce this issue correctly on my machine, "normal" elixirs also seem to have high allocation in the first run of a julia session. But for the next runs and tests of allocations they are on a normal level again.

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
5446200
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
816

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
816

For the subcell limiting simulations, after every simulation two allocation tests are needed to decrease the allocations.

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing_subcell.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
5485708
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
240

julia> trixi_include("../examples/tree_2d_dgsem/elixir_euler_shockcapturing_subcell.jl")
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t) # --> Failing tests with allocation upper bound of 1000
10544
julia> @allocated Trixi.rhs!(sol.u[end], sol.u[end], semi, sol.t)
240

Can this be a strange behavior of this test, or are the allocations of subcell limiting simulations just higher? With the latter, I would not understand why so much lower allocations are shown on the second attempt.

In general, I understand that my simulation has more allocations since I need memory for the antidiffusive flux and all coefficients (and in this case the simulation needs about 4 times the time steps). But does this justify so much higher allocations? @ranocha

And are we all fine with just increasing the maximum allowed number of allocations for subcell limiting simulations from 1,000 to 15,000?

@DanielDoehring
Copy link
Contributor Author

The additional allocations are due to the usage of the custom integrator. There seems still to be some type instability there. Compare for instance https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_1d_dgsem/elixir_hypdiff_nonperiodic.jl which uses CarpenterKennedy2N54 from OrdinaryDiffEq with summary cb

 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations      
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              763ms /  44.8%           45.0MiB /  24.5%    

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 I/O                           14    234ms   68.5%  16.7ms   7.00MiB   63.6%   512KiB
   ~I/O~                       14    148ms   43.2%  10.5ms   0.98MiB    8.9%  71.6KiB
   save solution               13   86.5ms   25.3%  6.65ms   6.03MiB   54.7%   475KiB
   save mesh                   13   2.07μs    0.0%   159ns     0.00B    0.0%    0.00B
   get element variables       13   1.44μs    0.0%   111ns     0.00B    0.0%    0.00B
   get node variables          13    242ns    0.0%  18.6ns     0.00B    0.0%    0.00B
 analyze solution              13    101ms   29.7%  7.80ms   4.01MiB   36.4%   316KiB
 rhs!                       5.56k   6.20ms    1.8%  1.12μs   6.61KiB    0.1%    1.22B
   ~rhs!~                   5.56k   2.26ms    0.7%   407ns   6.61KiB    0.1%    1.22B
   source terms             5.56k   1.59ms    0.5%   285ns     0.00B    0.0%    0.00B
   volume integral          5.56k    758μs    0.2%   136ns     0.00B    0.0%    0.00B
   boundary flux            5.56k    368μs    0.1%  66.1ns     0.00B    0.0%    0.00B
   interface flux           5.56k    270μs    0.1%  48.6ns     0.00B    0.0%    0.00B
   prolong2interfaces       5.56k    266μs    0.1%  47.8ns     0.00B    0.0%    0.00B
   surface integral         5.56k    232μs    0.1%  41.7ns     0.00B    0.0%    0.00B
   prolong2boundaries       5.56k    170μs    0.0%  30.5ns     0.00B    0.0%    0.00B
   reset ∂u/∂t              5.56k    149μs    0.0%  26.7ns     0.00B    0.0%    0.00B
   Jacobian                 5.56k    137μs    0.0%  24.7ns     0.00B    0.0%    0.00B
 calculate dt               1.11k   42.2μs    0.0%  37.9ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

without any measurement of the overall outer main loop. This is also used in elixir_euler_shockcapturing.jl.
In contrast, for our own custom integrators we have this additional measurement, see also
https://github.com/trixi-framework/Trixi.jl/blob/main/examples/tree_1d_dgsem/elixir_hypdiff_harmonic_nonperiodic.jl
with summary cb

 ──────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                       Time                    Allocations      
                                     ───────────────────────   ────────────────────────
          Tot / % measured:               430ms /  37.4%           14.7MiB /  59.7%    

 Section                     ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
 I/O                              2    155ms   96.5%  77.5ms   8.43MiB   96.4%  4.21MiB
   ~I/O~                          2    146ms   91.1%  73.1ms   8.42MiB   96.3%  4.21MiB
   save solution                  1   8.71ms    5.4%  8.71ms   7.30KiB    0.1%  7.30KiB
   get element variables          1    680ns    0.0%   680ns     0.00B    0.0%    0.00B
   save mesh                      1   30.0ns    0.0%  30.0ns     0.00B    0.0%    0.00B
   get node variables             1   21.0ns    0.0%  21.0ns     0.00B    0.0%    0.00B
 main loop                        1   5.50ms    3.4%  5.50ms    304KiB    3.4%   304KiB
   I/O                            6   2.07ms    1.3%   346μs   54.3KiB    0.6%  9.04KiB
     save solution                6   2.07ms    1.3%   344μs   43.8KiB    0.5%  7.30KiB
     ~I/O~                        6   8.39μs    0.0%  1.40μs   10.4KiB    0.1%  1.74KiB
     save mesh                    6    308ns    0.0%  51.3ns     0.00B    0.0%    0.00B
     get element variables        6    268ns    0.0%  44.7ns     0.00B    0.0%    0.00B
     get node variables           6    110ns    0.0%  18.3ns     0.00B    0.0%    0.00B
   rhs!                       2.72k   1.74ms    1.1%   641ns   6.61KiB    0.1%    2.49B
     ~rhs!~                   2.72k    979μs    0.6%   360ns   6.61KiB    0.1%    2.49B
     volume integral          2.72k    152μs    0.1%  55.8ns     0.00B    0.0%    0.00B
     boundary flux            2.72k    105μs    0.1%  38.5ns     0.00B    0.0%    0.00B
     source terms             2.72k   87.7μs    0.1%  32.2ns     0.00B    0.0%    0.00B
     interface flux           2.72k   80.1μs    0.0%  29.5ns     0.00B    0.0%    0.00B
     prolong2interfaces       2.72k   78.1μs    0.0%  28.7ns     0.00B    0.0%    0.00B
     surface integral         2.72k   72.2μs    0.0%  26.5ns     0.00B    0.0%    0.00B
     prolong2boundaries       2.72k   67.6μs    0.0%  24.8ns     0.00B    0.0%    0.00B
     reset ∂u/∂t              2.72k   65.1μs    0.0%  23.9ns     0.00B    0.0%    0.00B
     Jacobian                 2.72k   56.4μs    0.0%  20.7ns     0.00B    0.0%    0.00B
   ~main loop~                    1    971μs    0.6%   971μs    149KiB    1.7%   149KiB
   analyze solution               6    541μs    0.3%  90.2μs   94.2KiB    1.1%  15.7KiB
   Runge-Kutta step           2.72k    159μs    0.1%  58.5ns     0.00B    0.0%    0.00B
   calculate dt                 544   15.5μs    0.0%  28.5ns     0.00B    0.0%    0.00B
 analyze solution                 1    145μs    0.1%   145μs   15.8KiB    0.2%  15.8KiB
 calculate dt                     1    479ns    0.0%   479ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────

thus we measure actually also the callback business among others.
This is in contrast to the integrators from OrdinaryDiffEq which are somewhat blind - this is also the reason why I had to increase the allowed allocations in this PR after adding the measurement.
Most likely, those allocations also occur for the methods from OrdinaryDiffEq, they are simply not reported.

@DanielDoehring DanielDoehring deleted the Time_SSPRK33 branch October 24, 2023 09:37
@bennibolm
Copy link
Contributor

[...] which uses CarpenterKennedy2N54 from OrdinaryDiffEq with summary cb [...] without any measurement of the overall outer main loop.

I think that was actually the reason why I deleted the timer main loop at that time:sweat_smile:

Most likely, those allocations also occur for the methods from OrdinaryDiffEq, they are simply not reported.

Okay, thanks:+1:

@DanielDoehring
Copy link
Contributor Author

DanielDoehring commented Oct 24, 2023

I guess for comparison this makes sense (especially if you want to compare your SSPRK33 to the one of OrdinaryDiffEq) but I found it quite helpful in reporting this main loop timer since it would also detect e.g. allocations in the RK scheme, e.g. if you would have e.g. abstract float parameters of the method which would get promoted to float all the time (a completely made-up example that certainly has not and would not happen to me ... )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Make Michael happy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants