Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMA improvements #812

Closed
wants to merge 4 commits into from
Closed

Conversation

jeffhammond
Copy link

Adds:

  • win_allocate
  • win_lock_all
  • win_unlock_all
  • win_flush_local

Does not add:

  • win_flush_local_all
  • win_flush_all
  • rput, raccumulate, rget, rget_accumulate

I don't know how to compile or test this so I am relying on your CI to do that for me.

Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
Signed-off-by: Jeff Hammond <[email protected]>
@jeffhammond
Copy link
Author

I am not going to finish this until MPI.jl supports ARM. I do not get paid to write code for Intel anymore.

#404

@jeffhammond jeffhammond closed this Feb 1, 2024
@jeffhammond jeffhammond deleted the rma-updates branch February 1, 2024 13:42
@jeffhammond jeffhammond restored the rma-updates branch February 1, 2024 13:42
@giordano
Copy link
Member

giordano commented Feb 1, 2024

@jeffhammond I'm sorry to read you closed the PR for this reason, especially since a few of us, me included, are enthusiastic about ARM chips. @vchuravy and I published a paper about using Julia (and MPI.jl) on A64FX in the Embracing Arm for HPC Workshop, and more recently I've been playing with Julia on NVIDIA Grace Hopper, so we really want to support ARM as good as possible!

MPI.jl generally works well on ARM (as any other architectures where Julia runs), the issue you linked to refers only to custom MPI reductions, which make use of LLVM trampolines which are just not supported upstream in LLVM! This issue also affected PowerPC, but then someone fixed the issue for that architecture in LLVM and then we automatically benefited from it. If your company has LLVM engineers to spare, getting someone to fix this issue would benefit all compiler frontends based on LLVM targeting ARM architectures, not just Julia and MPI.jl!

@jeffhammond
Copy link
Author

I use custom MPI reductions quite a bit in my work, so they aren't something I can ignore. They are an MPI 1.0 feature AFAIK, so it doesn't make sense for me to work on RMA things if they aren't available (for the systems I'm using).

I will ask if anyone is willing to fix this, but I'm still disappointed that there is no workaround besides fixing LLVM.

@giordano
Copy link
Member

giordano commented Feb 1, 2024

I will ask if anyone is willing to fix this

Thanks! If that's of any help, this should be the upstream ticket: llvm/llvm-project#4100

but I'm still disappointed that there is no workaround besides fixing LLVM.

Julia uses LLVM quite a lot, so that's not totally unsurprising.

@vchuravy
Copy link
Member

vchuravy commented Feb 1, 2024

To be precise there are two ways to do C callable function in Julia.

function custom_op(x, y)
    x + y
end
cptr = @cfunction(custom_op, Float32, (Float32, Float32))

and the second way:

function create_cfunc(op)
     @cfunction($op, Float32, (Float32, Float32))
end
create_cfunc(custom_op)

The former works on all platforms that Julia support, the latter does not since it require compiler support that sofar we have struggled to figure out how to support it on AArch64. As far as I can tell it's equivalent to https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html

I haven't dug into this for a couple of years, but I think C moved to function descriptors for this? AFAIK trampolines are not supported by the ABI on aarch64.

It is this second way that is used in

fptr = @cfunction($w, Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cint}, Ptr{MPI_Datatype}))

@simonbyrne
Copy link
Member

I am not going to finish this until MPI.jl supports ARM. I do not get paid to write code for Intel anymore.

Unfortunately, no one else is paid to maintain or develop this package either, beyond what they immediately need for their own work. We're happy to discuss paths to fix the issue, but this sort of comment isn't really helpful.

@jeffhammond
Copy link
Author

I'm sorry. I was exasperated because I could not build test my changes on my laptop (Apple M2), which makes development harder. I can try to contribute to this on the weekend.

I know this is an LLVM issue and I'm trying to help with that.

@simonbyrne
Copy link
Member

I understand: the test suite is kind of clunky, and could also do with some more improvement, but I thought we disable those tests on non-supported platforms. Feel free to ask on Slack for any questions/frustrations.

Regarding custom operators, one particular challenge that is specific to MPI is that the API doesn't allow you to pass through a "thunk" pointer to the reduction function, which is how we usually avoid this problem for callbacks in other libraries. Do you think there would be any appetite adding an additional API that would support this to the MPI standard? Or have custom operators gone out of style since they're not supported for RDMA operations?

@jeffhammond
Copy link
Author

Custom reductions aren't optimized but the use cases tolerate that. RDMA doesn't do most MPI ops anyways, certainly not in bulk.

I don't know what thunk means here but in Mukautuva, I have a callback trampoline that uses datatype attributes to carry state from the caller to the callee, so I can invoke a different function. Does that sound relevant?

@giordano
Copy link
Member

@jeffhammond now that we have a workaround for custom reduction operations on aarch64 in the form of #871 (closures can't still be used as operators, but regular functions defined at the top-level of the program can now be used), I was wondering if you had interest/time to resume this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants