Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccall(:foo) doesn't pick up LD_PRELOAD overrides #53747

Open
green-nsk opened this issue Mar 15, 2024 · 10 comments
Open

ccall(:foo) doesn't pick up LD_PRELOAD overrides #53747

green-nsk opened this issue Mar 15, 2024 · 10 comments

Comments

@green-nsk
Copy link

green-nsk commented Mar 15, 2024

we're using a custom network stack library that loads via LD_PRELOAD mechanism and overrides certain function calls (socket(), recv(), setsockopt() and others). For some reason starting julia-1.10 ccall(:socket) doesn't pick up LD_PRELOAD version of the call anymore.

We found a workaround to call ccall((:socket, "")) works with our LD_PRELOAD network stack. We'd like to understand what's changed and how the two ccall() versions are different and how we can make sure we don't hit that in the future versions.

Repro:

$ cat socket.c
__attribute__((visibility("default")))
int
socket(int domain, int type, int protocol) {
    return 42;
}

$ gcc -shared -o socket.so -fPIC socket.c
$ LD_PRELOAD=./socket.so julia-1.9.3 -e 'println(ccall(:socket, Cint, (Cint, Cint, Cint), 0, 0, 0))'
42
$ LD_PRELOAD=./socket.so julia-1.10.2 -e 'println(ccall(:socket, Cint, (Cint, Cint, Cint), 0, 0, 0))'
-1
$ LD_PRELOAD=./socket.so julia-1.10.2 -e 'println(ccall((:socket, ""), Cint, (Cint, Cint, Cint), 0, 0, 0))'
42
$ LD_PRELOAD=./socket.so julia-1.10.2 -e '
    using Libdl
    socket_ptr = dlopen("") do h ; dlsym(h, :socket) end
    println(ccall(socket_ptr, Cint, (Cint, Cint, Cint), 0, 0, 0))'
42

Julia downloaded from https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.2-linux-x86_64.tar.gz

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
Environment:
  JULIA_PKG_DEVDIR = /home/sfokin/code/
@gbaraldi
Copy link
Member

gbaraldi commented Mar 15, 2024

While I haven't looked too deep into this, I imagine it has something to do with RTLD_DEEPBIND, which we use in many places. @staticfloat might have a better idea.

@green-nsk
Copy link
Author

@staticfloat or anyone else, is there more insight on what's happening?

At the very least I'd like to understand is it a bug/regression or is it by design?

@maleadt
Copy link
Member

maleadt commented Apr 4, 2024

Bisected to 82c89c6 from #50162; cc @topolarity.

@gbaraldi
Copy link
Member

gbaraldi commented Apr 4, 2024

So the reason this happens is that we lookup the symbols first in the libraries before we look them up in the main executable. That kind of behaves like RTLD_DEEPBIND, which has the side effect of making interposing symbols like this not work. The reason for that is to allow multiple julias to be loaded in the same process. I do wonder if we could try to restore the interposer behaviour a bit by looking non julia symbols in the executable and julia symbols in the library.

@topolarity
Copy link
Member

It's worth mentioning that (as @gbaraldi points out), the new behavior is similar to the (pre-existing) behavior from the C side which is due to RTLD_DEEPBIND:

$ cat mylib.c
int socket(int domain, int type, int protocol);

__attribute__((visibility("default")))
int socket2(int domain, int type, int protocol) {
    return socket(domain, type, protocol);
}
$ gcc -shared -o socket.so -fPIC socket.c
$ gcc -shared -o mylib.so -fPIC mylib.c
$ LD_PRELOAD=./socket.so julia-1.10 -e 'println(ccall((:socket2, "./mylib.so"), Cint, (Cint, Cint, Cint), 0, 0, 0))'
-1

I didn't think about LD_PRELOAD specifically when we made this change, but I think the need for an explicit opt-in to the interposition via ccall((:symbol, ""), ...) is probably a good thing despite the unintended breakage.

@vchuravy
Copy link
Member

vchuravy commented Apr 4, 2024

This might break system profilers that use LD_PRELOAD?

As an example in MPI.jl we use the pattern of ccall(:symbol) so that MPI profilers can hock these symbols explicitly
JuliaParallel/MPI.jl#451 / JuliaParallel/MPI.jl#450

We can of course change MPI to use the (:call, "") syntax...

@green-nsk
Copy link
Author

We already have a workaround for our case, but I can't see why inconsistent behaviour is acceptable. Also, I am worried there may be other unintended inconsistencies:

  • julia internal symbol resolution is inconsistent with ccall() resolution. In my particular case, LD_PRELOAD=socket.so affects calls to socket() from inside Sockets.jl/libuv, but not from ccall(:socket). This is probably the worst one.
  • dlsym(dlopen(""), :foo) also picks up a different function from ccall(:foo)
  • not sure how custom LD_LIBRARY_PATH would affect different APIs

If all of that is not a concern, at the very least there should be a mention of those different behaviours in call documentation.

@topolarity
Copy link
Member

julia internal symbol resolution is inconsistent with ccall() resolution. In my particular case, LD_PRELOAD=socket.so affects calls to socket() from inside Sockets.jl/libuv, but not from ccall(:socket). This is probably the worst one.

I agree that this is important.

The problem is that it's already inconsistent in 1.9:

$ cat socket.c
#include "stdio.h"
__attribute__((visibility("default")))
int socket(int domain, int type, int protocol) {
    fprintf(stderr, "Called LD_PRELOAD socket() hook\n");
    return 42;
}
$ LD_PRELOAD=./socket.so julia-1.9 -e 'using Sockets; TCPSocket(; delay = false)'
Called LD_PRELOAD socket() hook
$ LD_PRELOAD=./socket.so julia-1.9 -e 'using MySQL; DBInterface.connect(MySQL.Connection, "localhost", "user", "passwd")'
<no hook message>

Both of these use socket() internally, but only one picks up the LD_PRELOAD overload.

What's the deal? The difference here is that MySQL.jl is using libmariadbclient, which is loaded via JLL/ccall with RTLD_DEEPBIND, meaning that its symbol resolution is not affected by LD_PRELOAD.

In contrast, any of the "built-in" libraries (defined via DEP_LIBS here) are loaded without RTLD_DEEPBIND, which is why libuv does pick up the LD_PRELOAD

@topolarity
Copy link
Member

I do wonder if we could try to restore the interposer behaviour a bit by looking non julia symbols in the executable and julia symbols in the library.

We could check to see which library the symbol actually resolved to, and if it's not one of the libjulia-* we could repeat the look-up with the old behavior?

@green-nsk
Copy link
Author

We could check to see which library the symbol actually resolved to, and if it's not one of the libjulia-* we could repeat the look-up with the old behavior?

Wouldn't it mean that different Julia processes potentially loading different versions of JLL's will step onto each other's toes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants