Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with system-provided OpenMPI and HDF5_jll v1.14 #1079

Open
mfsch opened this issue Jun 14, 2023 · 8 comments
Open

Crash with system-provided OpenMPI and HDF5_jll v1.14 #1079

mfsch opened this issue Jun 14, 2023 · 8 comments

Comments

@mfsch
Copy link

mfsch commented Jun 14, 2023

When I set up a simple project with the latest MPI and HDF5 packages and configure it to use the system-provided OpenMPI installation, the call to MPI.Init() crashes with “orte_init failed” errors. I am observing issue on both Ubuntu 18.04 (OpenMPI 3.1.2) and 20.04 (OpenMPI 4.0.3). Downgrading to HDF5_jll v1.12 fixes the issue.

Steps to reproduce:

  • create a new folder and launch Julia with julia --project=.
  • install dependencies with ]add MPI HDF5
  • run using MPI; MPI.MPIPreferences.use_system_binary()
  • attempt to run mpirun -n 4 julia --project -e "using MPI, HDF5; MPI.Init()" (or mpiexecjl), observe crash
  • downgrade with ]add [email protected], rerun without crash

On Ubuntu 18.04, the error includes the line mca_base_component_repository_open: unable to open mca_pmix_pmix3x: /home/user/.julia/artifacts/f9744710560ba3ddc00cd9df62ac7dfcd18c8649/lib/openmpi/mca_pmix_pmix3x.so: undefined symbol: opal_envar_t_class, in case this is helpful.

@simonbyrne
Copy link
Collaborator

ah, I've seen something similar! The problem appears to be that we're opening two different MPI libraries (the system one from MPI.jl, and the JLL one (from HDF5_jll).

Easy workarounds:

  • use a system HDF5 (see HDF5.jl docs)
  • cap HDF5_jll at 1.12 (set the compat HDF5_jll = "~1.12".

In the longer term we need a better fix. @giordano @eschnett any suggestions on how we can deal with this?

@giordano
Copy link
Member

I thought HDF5_jll.jl would use the MPI library chosen by MPIPreferences.jl

@simonbyrne
Copy link
Collaborator

Yeah, i don't quite get why it's pulling in OpenMPI_jll?

@simonbyrne
Copy link
Collaborator

@eschnett
Copy link
Contributor

My approch, of course, would be to use the Julia-provided MPItrampoline as MPI implementation, and to use the system MPI via MPItrampoline...

@JoshuaLampert
Copy link
Contributor

Would it be possible to print a warning if a system-provided MPI installation, but no system-provided HDF5 is detected?

@mmesiti
Copy link

mmesiti commented Nov 24, 2024

I have been affected by this very problem. A package I am working on (let's call it PMFRG) depends on HDF5 and if I do

using MPI
using PMFRG # using HDF5
MPI.Init()

I get a crash without much explanation.
Interestingly, instead

using MPI
MPI.Init()
using PMFRG # using HDF5

works.
I had to dig some hours to find out that HDF5+MPIPreferences.use_system_binary() is the problem, missed the existence of this issue, crafted a repro case, started to open anoter issue and finally found this one.
So, yes, I am sorry I do not have the knowledge to make a PR for this but I wanted to say I have been affected.

@giordano
Copy link
Member

I believe using mpitrampoline as suggested above should help, see documentation added in JuliaParallel/MPI.jl#838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants