Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Local Exec" TLS model is used sometimes, resulting in different TLS addresses calculated and thus segfaults #457

Open
kkysen opened this issue Oct 28, 2024 · 2 comments
Assignees

Comments

@kkysen
Copy link
Contributor

kkysen commented Oct 28, 2024

After compartmentalizing dav1d and trying to run it, I started hitting a segfault at the very beginning, in init_stacks_and_setup_tls:

ia2_stackptr_0[0] = allocate_stack(0); \

This segfault wasn't initially happening, but like #455 and #456, started happening suddenly.

After much debugging with @rinon, we determined that the main compartment's DSO, which calls INIT_RUNTIME and thus defines init_stacks_and_setup_tls, was calculating a different address for the thread-local ia2_stackptr_0 as compared to other DSOs like liblibia2.a.

As @ahomescu explained to @rinon,

there are multiple TLS models, and we seem to be using conflicting models. We're getting a "Local Exec" model reference in the main binary to a thread local that is globally visible and all other references to it go through the GOT and have a different TL offset (so a different address for the thread local value)

We were able to workaround this by moving the assignment in

ia2_stackptr_0[0] = allocate_stack(0); \
to a separate function, allocate_stack_0, defined in libia2, thus preventing it from trying to use this "Local Exec" TLS model. While a valid workaround, this is just a workaround, as the root issue is still there, and I wonder if it may also be causing issues like #455 and #456.

kkysen added a commit that referenced this issue Oct 28, 2024
…tack to `ia2_stackptr_0` and thus uses the correct GOT thread local

Without this, sometimes the assignment to `ia2_stackptr_0` in `INIT_RUNTIME` in the main compartment
ended up using a "Local Exec" thread local that had a different address, and thus segfaulted.

This is a workaround to #457, but doesn't actually fix the root issue.
It does successfully prevent compartmentalized `dav1d` from segfaulting, though.
kkysen added a commit that referenced this issue Oct 28, 2024
… TLS model issue and prevent segfaulting

`allocate_stack_0` writes the allocated stack to `ia2_stackptr_0`.
Previously, this was done inline in `init_stacks_and_setup_tls`,
which is defined in the main compartment with `INIT_RUNTIME`,
and thus may sometimes use the "Local Exec" TLS model and thus use a different thread-local address.
Moving this to `allocate_stack_0` in `libia2` forces it to use the correct GOT thread local.

This is a workaround to #457, but doesn't actually fix the root issue.
It does successfully prevent compartmentalized `dav1d` from segfaulting, though.
@rinon
Copy link
Collaborator

rinon commented Nov 1, 2024

Do you have any way to get this crashing build again? Can I reproduce it? I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.

    3887:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
    388e:       00 00
    3890:       48 03 05 69 5b 00 00    add    0x5b69(%rip),%rax        # 9400 <ia2_stackptr_0@@Base+0x7400>
    3897:       48 89 18                mov    %rbx,(%rax)

If I recall correctly we were seeing basically %fs + fixed offset instead of fetching the global containing the offset.

@kkysen
Copy link
Contributor Author

kkysen commented Nov 1, 2024

Do you have any way to get this crashing build again? Can I reproduce it?

It still reproduces the same for me when I have ia2 on the latest main (without #458) and build dav1d with ./rewrite.py (only dependency should be uv) in immunant/dav1d/ia2.

I'm having a hard time getting this to reproduce in a smaller test case. I'm only getting the compiler to generate the following sequence, instead of the hardcoded TLS offset we saw.

    3887:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
    388e:       00 00
    3890:       48 03 05 69 5b 00 00    add    0x5b69(%rip),%rax        # 9400 <ia2_stackptr_0@@Base+0x7400>
    3897:       48 89 18                mov    %rbx,(%rax)

If I recall correctly we were seeing basically %fs + fixed offset instead of fetching the global containing the offset.

This is the faulting instruction for me:

gdb:

0x555555558ed7 <init_stacks_and_setup_tls+600>:      mov    %rax,%fs:0xffffffffffffe000

llvm-objdump:

4ed7: 64 48 89 04 25 00 e0 ff ff    movq    %rax, %fs:-0x2000

The -0x2000 is the fixed offset, right?

@ayrtonm ayrtonm self-assigned this Nov 4, 2024
ayrtonm pushed a commit that referenced this issue Nov 18, 2024
… TLS model issue and prevent segfaulting

`allocate_stack_0` writes the allocated stack to `ia2_stackptr_0`.
Previously, this was done inline in `init_stacks_and_setup_tls`,
which is defined in the main compartment with `INIT_RUNTIME`,
and thus may sometimes use the "Local Exec" TLS model and thus use a different thread-local address.
Moving this to `allocate_stack_0` in `libia2` forces it to use the correct GOT thread local.

This is a workaround to #457, but doesn't actually fix the root issue.
It does successfully prevent compartmentalized `dav1d` from segfaulting, though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants