Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base.min / Base.max in MPI reductions #2054

Merged
merged 11 commits into from
Sep 13, 2024
Merged

Conversation

benegee
Copy link
Contributor

@benegee benegee commented Aug 30, 2024

We can use this workaround to resolve one part of #1922.

MPI.jl's reduce currently does not work for custom operators (such as Trixi's
min/max) on ARM
Copy link
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Aug 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.32%. Comparing base (e4040e7) to head (ffad95a).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2054   +/-   ##
=======================================
  Coverage   96.32%   96.32%           
=======================================
  Files         470      470           
  Lines       37486    37486           
=======================================
  Hits        36107    36107           
  Misses       1379     1379           
Flag Coverage Δ
unittests 96.32% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Shall we also switch to macos-latest in

os: macos-13
?

src/callbacks_step/analysis.jl Outdated Show resolved Hide resolved
src/callbacks_step/analysis_dg2d_parallel.jl Outdated Show resolved Hide resolved
src/callbacks_step/analysis_dg3d_parallel.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg2d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg2d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg2d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg3d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg3d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg3d.jl Outdated Show resolved Hide resolved
src/callbacks_step/stepsize_dg3d.jl Outdated Show resolved Hide resolved
@benegee
Copy link
Contributor Author

benegee commented Aug 30, 2024

Thanks! Shall we also switch to macos-latest in

os: macos-13

?

Fine with me! However I am not quiet clear as to why this fixed things in the past.

Also, should we not test ARM here, besides / instead of x86?

@ranocha
Copy link
Member

ranocha commented Aug 30, 2024

macos-latest is macos-14, which is only available with ARM - we should also delete the confusing arch specification when updating it.
macos-13 is an x86 Intel architecture, so this fixed the issue

@ranocha
Copy link
Member

ranocha commented Aug 31, 2024

There are still some user-defined MPI reductions in the integration methods.

  • Ideally, we should also fix those.
  • If the current implementation already enables you to do something that has not been possible before, we can merge the fixes (without the switch to macos-latest) and fix the remaining issues later in another PR.

What do you prefer?

@benegee
Copy link
Contributor Author

benegee commented Aug 31, 2024

Indeed! This time not the operator is the problem, but the operands. E.g. where the current CI fails, we are dealing with buf::Base.RefValue{StaticArraysCore.SVector{4, Float64}}. I am not sure if any vectorial data structure would fail, but I suppose so, cf. https://github.com/JuliaParallel/MPI.jl/blob/780aaa0fdb768713a329659338a9c9cde23c41a8/src/operators.jl#L59C1-L59C110

For my current work I only need the fixes in my personal branch, where I tested this initially. So I am in favor of fixing all occurrences.

I do not have a great idea though. Just reducing each entry in the vector individually would of course be an option. A nicer solution would probably be to define custom reduction operators ourselves, as done here https://juliaparallel.org/MPI.jl/stable/examples/03-reduce/.

@ranocha
Copy link
Member

ranocha commented Aug 31, 2024

Did you test the example with macos?

@benegee
Copy link
Contributor Author

benegee commented Aug 31, 2024

No, I do not have a mac. But I could try with the GH system.

@ranocha
Copy link
Member

ranocha commented Aug 31, 2024

That would be great 👍

@benegee
Copy link
Contributor Author

benegee commented Aug 31, 2024

It does not work. Another individual operator does not help. Instead one would need to directly generate an (MPI.jl) Op object.

function reduce_vector_plus(x, y)
x .+ y
end
MPI.@Op(reduce_vector_plus, SVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will work...

You would need to say:
MPI.@Op(reduce_vector_plus, SVector{3, Float32,})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means that we will have to do this for many types (different lengths, Float64 and maybe Float32, ...)

Copy link
Contributor Author

@benegee benegee Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will work...

You are absolutely right.

Which means that we will have to do this for many types (different lengths, Float64 and maybe Float32, ...)

I was just trying to understand this. Is there no supertype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly that doesn't work.

We generate a "wrapper" that looks like this:

function (w::OpWrapper{F,T})(_a::Ptr{Cvoid}, _b::Ptr{Cvoid}, _len::Ptr{Cint}, t::Ptr{MPI_Datatype}) where {F,T}
    len = unsafe_load(_len)
    @assert isconcretetype(T)
    a = Ptr{T}(_a)
    b = Ptr{T}(_b)
    for i = 1:len
        unsafe_store!(b, w.f(unsafe_load(a,i), unsafe_load(b,i)), i)
    end
    return nothing
end

So we get two pointer to an array of data, and we must reinterpret the pointer to a concrete type so that we can load it. Maybe one could use t to identify which Julia type one aught to use, but that would be less efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, would it make sense to convert the SVector to play Vectors in our MPI routines to make our life easier and fix this issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you currently have data = Vector{SVector{5, Float64}}, you could reinterpret that to Ptr{Float64} as long as your reduce_vector function is not using the fact that the datatype is a SVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need any special SVector functionality here.
But we need to know the number of elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only when @vchuravy mentioned the OpWrapper I realized that it already iterates through something like a vector.
_len will be 1 in case of our SVectors, but carry the right number when using Vectors (where does this actually come from?). So, using a Vector or reinterpreting the SVector as Ptr[Float64} seems to make the reduction work, without a custom operator (currently tree 2d only).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this here now: #2067

@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u,
normalize = normalize)

# OBS! Global results are only calculated on MPI root, all other domains receive `nothing`
global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm())
global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place where we need the vector reduction. Currently, local_integral can be a Float64 in some cases (when we compute the total entropy) or an SVector (when we compute the total mass of all conserved quantities). What I'm suggesting is to reduce collect(local_integral) instead of Ref(local_integral). That should work, shouldn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that should work, but of course it would require an extra allocation.

Copy link
Member

@ranocha ranocha Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I'm just looking for a solution that is at the Pareto front of optimality in terms of code complexity, code generality, and efficiency. While the @Op approach is likely best in terms of efficiency, I have some doubts about the code complexity and generality - shall we do it for SVector{N, T} for N in 1:10 (or more?) and T in (Float32, Float64) - and maybe also scalars? Will we need something else? It's kind of bad that Trixi.jl shall be a library and not a single code for a specific application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's annoying that MPI doesn't specify a "reverse" translation of MPI_Datatype.
We could maybe have a dictonary where we do MPI_Datatype => Type and then we can use that to get a concrete type, but that would cause a dynamic dispatch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out MPI.jl has support for reverse translations.

I just pushed a commit that allows for @Op(+, Any).

Copy link
Member

@ranocha ranocha Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can we please test this here, @benegee?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this here now: #2066

@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u,
normalize = normalize)

# OBS! Global results are only calculated on MPI root, all other domains receive `nothing`
global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm())
global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm())
if mpi_isroot()
integral = convert(typeof(local_integral), global_integral[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, we may have to use a special handling if local_integral isa Real

@benegee benegee force-pushed the bg/base_min_in_mpi_reduce branch from 6140e98 to 6c659f5 Compare September 5, 2024 16:27
@DanielDoehring DanielDoehring added the parallelization Related to MPI, threading, tasks etc. label Sep 6, 2024
@benegee benegee requested a review from ranocha September 12, 2024 21:02
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ranocha ranocha merged commit 148dd67 into main Sep 13, 2024
38 checks passed
@ranocha ranocha deleted the bg/base_min_in_mpi_reduce branch September 13, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Related to MPI, threading, tasks etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants