Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add has_rocm for OpenMPI #821

Merged
merged 4 commits into from
Jun 23, 2024
Merged

Add has_rocm for OpenMPI #821

merged 4 commits into from
Jun 23, 2024

Conversation

avik-pal
Copy link
Contributor

@avik-pal avik-pal commented Feb 25, 2024

flag = get(ENV, "JULIA_MPI_HAS_ROCM", nothing)
if flag === nothing
# Only Open MPI provides a function to check ROCm support
@static if MPI_LIBRARY == "OpenMPI"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to use MPI_LIBRARY_VERSION to check that this is indeed v5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current compat is on 4. Should we hold it off till newer JLLs are supported?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone will have to debug what's the problem with OpenMPI 5: #789 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone could already run with a local version of OpenMPI 5 before we get around to jll

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, MPI.jl tests pass in that container, I'll open a PR to add that to the matrix so that we test OpenMPI v5. We still need to figure out what was the problem with the JLL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avik-pal with #826 now merged we actually have coverage for OpenMPI v5 on CI (although not with an AMD GPU available...) so you can do a test like

Suggested change
@static if MPI_LIBRARY == "OpenMPI"
@static if MPI_LIBRARY == "OpenMPI" && MPI_LIBRARY_VERSION >= v"5"

as suggested above by @vchuravy and we can at least ensure the ccall is working correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally that ccall works by bumping the jll compat to 5

julia> using MPI

julia> MPI.has_cuda()
false

julia> MPI.has_rocm()
false

src/environment.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Member

Can also document has_rocm in https://juliaparallel.org/MPI.jl/stable/knownissues/#Hints-to-ensure-ROCm-aware-MPI-to-be-functional, like has_cuda is documented in the CUDA section.

@giordano
Copy link
Member

@avik-pal are you interested in moving this forward?

@avik-pal
Copy link
Contributor Author

Completely fell off my radar, I will finish this over the weekend

@JBlaschke
Copy link
Contributor

Once this is merged, I can implement has_gpu; has_cuda and has_rocm for Cray MPICH

@luraess
Copy link
Contributor

luraess commented Jun 21, 2024

Maybe one could add a sentence similar to CUDA (https://github.com/JuliaParallel/MPI.jl/blob/master/docs/src/usage.md)

If using OpenMPI, the status of CUDA support can be checked via the [MPI.has_cuda()](https://github.com/JuliaParallel/MPI.jl/blob/master/docs/src/@ref) function.

in the ROCm-aware section of the docs?

@avik-pal avik-pal force-pushed the ap/rocm branch 4 times, most recently from 9dc9b45 to 7de8a18 Compare June 22, 2024 16:57
@avik-pal
Copy link
Contributor Author

Maybe one could add a sentence similar to CUDA (master/docs/src/usage.md)

If using OpenMPI, the status of CUDA support can be checked via the [MPI.has_cuda()](https://github.com/JuliaParallel/MPI.jl/blob/master/docs/src/@ref) function.

in the ROCm-aware section of the docs?

done

@giordano
Copy link
Member

Thanks!

@giordano
Copy link
Member

Can also document has_rocm in https://juliaparallel.org/MPI.jl/stable/knownissues/#Hints-to-ensure-ROCm-aware-MPI-to-be-functional, like has_cuda is documented in the CUDA section.

Last documentation request and the we should be good to go ☝️

@avik-pal
Copy link
Contributor Author

Updated the docs

@vchuravy
Copy link
Member

Can you erase on master?

@avik-pal
Copy link
Contributor Author

Can you erase on master?

What do you want me to erase?

@giordano
Copy link
Member

I presume that was meant to be "rebase".

@vchuravy vchuravy merged commit 71acbb7 into JuliaParallel:master Jun 23, 2024
52 checks passed
@avik-pal avik-pal deleted the ap/rocm branch June 23, 2024 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants