Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: modify calc_node_coordinates! to allow for P4estMesh{2} with 3D coordinates #2046

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/solvers/dgsem_p4est/containers_2d.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ function calc_node_coordinates!(node_coordinates,
# places and the additional information passed to the compiler makes them faster
# than native `Array`s.
tmp1 = StrideArray(undef, real(mesh),
StaticInt(2), static_length(nodes), static_length(mesh.nodes))
StaticInt(size(mesh.tree_node_coordinates, 1)),
static_length(nodes), static_length(mesh.nodes))
Comment on lines -46 to +47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check the performance and type stability of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about the fact that this would get called repeatedly when using mesh adaptation, so I initially wasn't too concerned about performance. I've now done some tests with a 2D mesh.

julia> using Trixi, BenchmarkTools

julia> using StaticArrayInterface: static_length

julia> using StrideArrays: StrideArray, StaticInt

julia> solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs);

julia> mesh = P4estMesh((8, 8), polydeg = 3,
                        coordinates_min = (-1.0, -1.0), coordinates_max = (1.0, 1.0),
                        initial_refinement_level = 1);

Existing code:

julia> @benchmark StrideArray(undef, real(mesh),
                              StaticInt(2),
                              static_length(solver.basis.nodes), static_length(mesh.nodes))
BenchmarkTools.Trial: 10000 samples with 195 evaluations.
 Range (min … max):  489.103 ns … 539.166 μs  ┊ GC (min … max):  0.00% … 99.88%
 Time  (median):     505.769 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   583.345 ns ±   5.396 μs  ┊ GC (mean ± σ):  11.83% ±  4.49%

        ▃██▂                                                     
  ▁▁▁▂▃▆████▅▂▂▂▂▂▂▁▁▁▂▂▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  489 ns           Histogram: frequency by time          607 ns <

 Memory estimate: 880 bytes, allocs estimate: 6.

New code:

julia> @benchmark StrideArray(undef, real(mesh),
                              StaticInt(size(mesh.tree_node_coordinates, 1)),
                              static_length(solver.basis.nodes), static_length(mesh.nodes))
BenchmarkTools.Trial: 10000 samples with 142 evaluations.
 Range (min … max):  701.296 ns … 750.380 μs  ┊ GC (min … max):  0.00% … 99.87%
 Time  (median):     717.725 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   819.442 ns ±   7.505 μs  ┊ GC (mean ± σ):  10.81% ±  3.66%

    ▂▅▇█▇▆▅▂▁▁▂▃▂▃▃▄▄▄▃▃▃▂▂▂▂▂▁▁▁ ▁                             ▂
  ▆███████████████████████████████████▇▇▇▆▆▆▆▆▆▆▄▅▅▅▄▅▃▆▆▆▅▅▆▄▃ █
  701 ns        Histogram: log(frequency) by time        851 ns <

 Memory estimate: 880 bytes, allocs estimate: 6.

As we can see, here is some additional cost, but it seems to be a fixed overhead that doesn't depend on the mesh size, so I'm not sure if it's a significant regression in performance or not. I don't see any type instabilities from @code_warntype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also report benchmarks of a full call to calc_node_coordinates!?

Copy link
Member Author

@tristanmontoya tristanmontoya Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, actually there seems to be a type instability when I call the full calc_node_coordinates!, and the performance is worse. See below.

julia> using Trixi, BenchmarkTools

julia> using StaticArrayInterface: static_length

julia> using StrideArrays: StrideArray, StaticInt

julia> solver = DGSEM(polydeg = 3, surface_flux = flux_lax_friedrichs);

julia> mesh = P4estMesh((8, 8), polydeg = 3,
                        coordinates_min = (-1.0, -1.0), coordinates_max = (1.0, 1.0),
                        initial_refinement_level = 1);

julia> elements = Trixi.init_elements(mesh, LinearScalarAdvectionEquation2D((1.0,1.0)), solver.basis, Float64);

Benchmark using the existing code:

julia> @benchmark Trixi.calc_node_coordinates!(elements.node_coordinates, mesh, solver.basis.nodes)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.250 μs …  50.042 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     23.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.528 μs ± 487.328 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁  ▆   █  ▆   ▅  ▄   ▁                                 
  ▂▁▁▃▁▁▁█▁▁█▁▁▁█▁▁█▁▁▁█▁▁█▁▁▁█▁▁▇▁▁▁▆▁▁▅▁▁▁▄▁▁▃▁▁▁▃▁▁▃▁▁▁▂▁▁▂ ▃
  23.2 μs         Histogram: frequency by time           24 μs <

 Memory estimate: 4.23 KiB, allocs estimate: 73.

Benchmark with the proposed change:

julia> @benchmark Trixi.calc_node_coordinates!(elements.node_coordinates, mesh, solver.basis.nodes)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  66.459 μs … 114.210 ms  ┊ GC (min … max):  0.00% … 99.92%
 Time  (median):     69.958 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   82.256 μs ±   1.142 ms  ┊ GC (mean ± σ):  14.26% ±  1.39%

    ▂▁      ▁     █▇▅ ▃▂▂                                       
  ▁▄██▇▃▂▂▄▇█▅▃▂▃▆███▇███▆▆▄▅▄▅▄▄▄▃▃▃▄▃▃▂▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  66.5 μs         Histogram: frequency by time         77.7 μs <

 Memory estimate: 45.66 KiB, allocs estimate: 784.

Because of these performance regressions, I will mark this PR as a draft until they are resolved, since we can work around this in TrixiAtmo.jl for now. Thanks @ranocha for the suggestions!

matrix1 = StrideArray(undef, real(mesh),
static_length(nodes), static_length(mesh.nodes))
matrix2 = similar(matrix1)
Expand Down
Loading