Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI tracing and profiling #444

Closed
vchuravy opened this issue Nov 17, 2020 · 15 comments
Closed

MPI tracing and profiling #444

vchuravy opened this issue Nov 17, 2020 · 15 comments

Comments

@vchuravy
Copy link
Member

@simonbyrne and I tried yesterday to use nsys to debug an MPI program. They preload a wrapper library which doesn't work since we use dlopen directly. Once can hack around that, but a cleaner approach would be to use Preferences.jl to allow arbitrary instrumentation of

macro mpicall(expr)
@assert expr isa Expr && expr.head == :call && expr.args[1] == :ccall
# Microsoft MPI uses stdcall calling convention
# this only affects 32-bit Windows
# unfortunately we need to use ccall to call Get_library_version
# so check using library name instead
if use_stdcall
insert!(expr.args, 3, :stdcall)
end
return esc(expr)
end

Since NVTX requires a library, and alternative would be NVTXT which I happen to have an implementation lying around JuliaGPU/KernelAbstractions.jl@9852b76

@vchuravy vchuravy changed the title MPI traching and profiling MPI tracing and profiling Nov 17, 2020
@simonbyrne
Copy link
Member

It would also be good to provide a way to support other MPI profilers which use LD_PRELOAD (which doesn't work with Julia as we dlopen the library directly). One idea would be something like @runtime_ccall used by CUDA.jl, where we check if e.g. a library ENV["JULIA_MPI_PRELOAD"] exists and contains an appropriate symbol.

@richardreeve
Copy link

We’d be very keen on some more general support too as we’re trying to use extrae for some profiling and it also doesn’t work because it relies on LD_PRELOAD too...

@vchuravy
Copy link
Member Author

@staticfloat any ideas for dealing with libraries that interpose with LD_PRELOAD?

@staticfloat
Copy link
Contributor

So you need to dlopen() something else before you dlopen() the package you're interested in?

@vchuravy
Copy link
Member Author

Yeah. We basically want to overlay/redirect lookups to a shim library.

@simonbyrne
Copy link
Member

To clarify, MPI provides a specific profiling interface: the MPI_XXX function is just a wrapper around another function named PMPI_XXX. This allows profilers to export the MPI_XXX symbols via LD_PRELOAD which can do whatever they want, then call PMPI_XXX. See https://www.open-mpi.org/faq/?category=perftools#PMPI for a description on how Open MPI does this.

@staticfloat
Copy link
Contributor

staticfloat commented Nov 30, 2020

So is there a reason you can't just dlopen() whatever library you wish to override symbols with before you load lib*mpi?

@simonbyrne
Copy link
Member

Wouldn't it still look up the symbols in libmpi?

@staticfloat
Copy link
Contributor

It depends on how the lookups are being done.

If you have Julia code that's calling symbols, you should just change which library you are dlsym()'ing the symbols from.

If you have C code that is calling foo() and isn't explicitly dlsym'ing it from a particular library but is instead relying upon the dynamic linker to load the library and provide it the symbols, then yes, I would expect that dlopen()'ing some other library first that provides the foo() function would cause it to look up the foo() symbol in that other library. If it doesn't work at first, make sure that RTLD_GLOBAL is passed in to dlopen().

@lyon-fnal
Copy link

lyon-fnal commented Dec 10, 2020

Ahh - this is an interesting issue! I had been trying to use Darshan to profile MPI and HDF5 i/o. Since is uses LD_PRELOAD, the issue here explains why it didn't work. It sounds like I can fix this by calling dlopen(...) on the darshan library before loading MPI and HDF5. I'll try it.

@simonbyrne
Copy link
Member

We may need to change all the ccalls, so that we call ccall(:MPI_XXX,...) instead of ccall((:MPI_XXX, libmpi), ...)

@lyon-fnal
Copy link

Hi - I have a simple minimum working example in this gist. Indeed you'll have to change all ccall to just have the function and not specify the library. If your "injected" library does not wrap the original function with dlsym(RTLD_NEXT, ...) then I think you can dlopen the injected library before the dlopen for, say, libmpi. You must use RTLD_GLOBAL so that ccall can find the function without the library specification. If the injection library does wrap the original function (as darshan does) then you'll have to use LD_PRELOAD=/path/to/inject.so julia myscript.jl.

@simonbyrne
Copy link
Member

I think that seems like the best way forward. Thanks @lyon-fnal for trying it out.

@lyon-fnal
Copy link

I was able to modify MPI.jl and HDF5.jl to use ccall without explicitly specifying the library (e.g. ccall(:func, ...) and not ccall( (:func, lib), ...). With that, I'm able to get Darshan (MPI and HDF5 I/O profiler) to work with LD_PRELOAD. Pretty neat, though not as easy as I thought it would be. I'll open PRs for MPI.jl and HDF5.jl with my changes shortly.

I'm contemplating what to tell the Julia developers about this ... using ccall( (func, lib), ...) as they suggest in the Julia docs is easy since you don't have to worry about loading the shared object yourself with Libdl.dlopen, but it does mean that LD_PRELOAD won't work. I know of several profiling systems that rely on LD_PRELOAD, so not having that capability is detrimental. May be nice if they at least have a warning about breaking LD_PRELOAD. Do you all have suggestions?

@simonbyrne
Copy link
Member

Should be fixed by #451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants