-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Base.min / Base.max in MPI reductions #2054
Conversation
MPI.jl's reduce currently does not work for custom operators (such as Trixi's min/max) on ARM
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2054 +/- ##
=======================================
Coverage 96.32% 96.32%
=======================================
Files 470 470
Lines 37486 37486
=======================================
Hits 36107 36107
Misses 1379 1379
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Shall we also switch to macos-latest
in
Trixi.jl/.github/workflows/ci.yml
Line 93 in 2ac203e
os: macos-13 |
Co-authored-by: Hendrik Ranocha <[email protected]>
Fine with me! However I am not quiet clear as to why this fixed things in the past. Also, should we not test ARM here, besides / instead of x86? |
macos-latest is macos-14, which is only available with ARM - we should also delete the confusing arch specification when updating it. |
macos-latest is 14, which is ARM
There are still some user-defined MPI reductions in the integration methods.
What do you prefer? |
Indeed! This time not the operator is the problem, but the operands. E.g. where the current CI fails, we are dealing with For my current work I only need the fixes in my personal branch, where I tested this initially. So I am in favor of fixing all occurrences. I do not have a great idea though. Just reducing each entry in the vector individually would of course be an option. A nicer solution would probably be to define custom reduction operators ourselves, as done here https://juliaparallel.org/MPI.jl/stable/examples/03-reduce/. |
Did you test the example with macos? |
No, I do not have a mac. But I could try with the GH system. |
That would be great 👍 |
It does not work. Another individual operator does not help. Instead one would need to directly generate an (MPI.jl) |
src/auxiliary/mpi.jl
Outdated
function reduce_vector_plus(x, y) | ||
x .+ y | ||
end | ||
MPI.@Op(reduce_vector_plus, SVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will work...
You would need to say:
MPI.@Op(reduce_vector_plus, SVector{3, Float32,})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means that we will have to do this for many types (different lengths, Float64
and maybe Float32
, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will work...
You are absolutely right.
Which means that we will have to do this for many types (different lengths, Float64 and maybe Float32, ...)
I was just trying to understand this. Is there no supertype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly that doesn't work.
We generate a "wrapper" that looks like this:
function (w::OpWrapper{F,T})(_a::Ptr{Cvoid}, _b::Ptr{Cvoid}, _len::Ptr{Cint}, t::Ptr{MPI_Datatype}) where {F,T}
len = unsafe_load(_len)
@assert isconcretetype(T)
a = Ptr{T}(_a)
b = Ptr{T}(_b)
for i = 1:len
unsafe_store!(b, w.f(unsafe_load(a,i), unsafe_load(b,i)), i)
end
return nothing
end
So we get two pointer to an array of data, and we must reinterpret the pointer to a concrete type so that we can load it. Maybe one could use t
to identify which Julia type one aught to use, but that would be less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, would it make sense to convert the SVector
to play Vector
s in our MPI routines to make our life easier and fix this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you currently have data = Vector{SVector{5, Float64}}
, you could reinterpret that to Ptr{Float64}
as long as your reduce_vector
function is not using the fact that the datatype is a SVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need any special SVector functionality here.
But we need to know the number of elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only when @vchuravy mentioned the OpWrapper
I realized that it already iterates through something like a vector.
_len
will be 1 in case of our SVectors
, but carry the right number when using Vectors
(where does this actually come from?). So, using a Vector
or reinterpreting the SVector
as Ptr[Float64}
seems to make the reduction work, without a custom operator (currently tree 2d only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this here now: #2067
@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u, | |||
normalize = normalize) | |||
|
|||
# OBS! Global results are only calculated on MPI root, all other domains receive `nothing` | |||
global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm()) | |||
global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where we need the vector reduction. Currently, local_integral
can be a Float64
in some cases (when we compute the total entropy
) or an SVector
(when we compute the total mass of all conserved quantities). What I'm suggesting is to reduce collect(local_integral)
instead of Ref(local_integral)
. That should work, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that should work, but of course it would require an extra allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I'm just looking for a solution that is at the Pareto front of optimality in terms of code complexity, code generality, and efficiency. While the @Op
approach is likely best in terms of efficiency, I have some doubts about the code complexity and generality - shall we do it for SVector{N, T}
for N in 1:10
(or more?) and T in (Float32, Float64)
- and maybe also scalars? Will we need something else? It's kind of bad that Trixi.jl shall be a library and not a single code for a specific application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's annoying that MPI doesn't specify a "reverse" translation of MPI_Datatype
.
We could maybe have a dictonary where we do MPI_Datatype => Type
and then we can use that to get a concrete type, but that would cause a dynamic dispatch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out MPI.jl has support for reverse translations.
I just pushed a commit that allows for @Op(+, Any)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Can we please test this here, @benegee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this here now: #2066
@@ -161,7 +162,7 @@ function integrate_via_indices(func::Func, u, | |||
normalize = normalize) | |||
|
|||
# OBS! Global results are only calculated on MPI root, all other domains receive `nothing` | |||
global_integral = MPI.Reduce!(Ref(local_integral), +, mpi_root(), mpi_comm()) | |||
global_integral = MPI.Reduce!(Ref(local_integral), reduce_vector_plus, mpi_root(), mpi_comm()) | |||
if mpi_isroot() | |||
integral = convert(typeof(local_integral), global_integral[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, we may have to use a special handling if local_integral isa Real
6140e98
to
6c659f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We can use this workaround to resolve one part of #1922.