Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread_local! initialization code panics on some aarch64-apple-darwin envs, works on others #132704

Open
dj8yfo opened this issue Nov 6, 2024 · 16 comments
Labels
A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status O-AArch64 Armv8-A or later processors in AArch64 mode O-macos Operating system: macOS T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@dj8yfo
Copy link

dj8yfo commented Nov 6, 2024

I tried this code:

https://github.com/near/near-sdk-rs/blob/master/near-sdk/src/environment/mock/mod.rs#L11-L16

thread_local! {
    static BLOCKCHAIN_INTERFACE: RefCell<MockedBlockchain>
         = RefCell::new(MockedBlockchain::default());
}

I expected to see this happen:

thread_local! initialization code for RefCell::new(MockedBlockchain::new)
works

Instead, this happened:

on unclarified aarch64-apple-darwin environment(s) thread_local!
panics on https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1277 .

The issue doesn't reproduce on other aarch64-apple-darwin real macs,
and on macos-14-arm64 github actions vm.

Original issue: near/near-sdk-rs#1252

Meta

rustc --version --verbose:

stable: 1.80, 1.81, 1.82 panics
beta 1.83 panics
nightly 1.84 panics

backtrace from original issue:
near/near-sdk-rs#1252 (comment)

@dj8yfo dj8yfo added the C-bug Category: This is a bug. label Nov 6, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 6, 2024
@dj8yfo
Copy link
Author

dj8yfo commented Nov 6, 2024

Somewhat limited pre-analysis (can be safely ignored)

ub_checks::assert_unsafe_precondition! check inside of
https://github.com/rust-lang/rust/blob/master/library/std/src/sys/thread_local/native/lazy.rs#L66
fails,
because earlier LazyStorage::new failed to put State::Initial
into an adress with correct alignment for State<T,D>
https://github.com/rust-lang/rust/blob/master/library/std/src/sys/thread_local/native/lazy.rs#L37,
where T is RefCell<MockedBlockchain<Memory = MockedMemory>>

related case

earlier we had a similar issue with aarch64-apple-darwin with somewhat similar panics:

0x102f569ec - core::ptr::replace::he9e2c42f9fe96d9b
                  at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ub_checks.rs:73:17
0x102f569ec - core::ptr::mut_ptr::<impl *mut T>::replace::hcc3425bab485c184
                  at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ptr/mut_ptr.rs:1560:18
0x102f569ec - std::sys::thread_local::lazy::LazyKeyInner<T>::take::h11414c1b64d26b2a
                  at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys/thread_local/mod.rs:102:30

which originated in some register_dtor functions when trying to run doc-tests
for the same near-sdk-rs project : #126643

It was later confirmed that the issue stopped to manifest itself on exactly the same
rustc 1.79.0 (129f3b996 2024-06-10) compiler after macos-14-arm64 github actions
vm was updated.

@jieyouxu jieyouxu added O-AArch64 Armv8-A or later processors in AArch64 mode S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status O-macos Operating system: macOS T-libs Relevant to the library team, which will review and decide on the PR/issue. A-thread-locals Area: Thread local storage (TLS) and removed S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 7, 2024
@madsmtm
Copy link
Contributor

madsmtm commented Nov 10, 2024

I tried going through the linked issues, but I couldn't figure out how you reproduce the issue (what commands do you run?).

A hypothetical minimal reproducer based on the code you posted doesn't reproduce on my Mac (M2, macOS 14.7):

use std::cell::RefCell;

#[repr(align(16))]
#[derive(Debug)]
struct MockedBlockchain {
    _size: [u8; 2512],
}

thread_local! {
    static BLOCKCHAIN_INTERFACE: RefCell<MockedBlockchain> = RefCell::new(MockedBlockchain {
        _size: [1; 2512],
    });
}

fn main() {
    BLOCKCHAIN_INTERFACE.with(|b| println!("{b:?}"));
}

So it sounds like it might be something else that is causing the issue?

@madsmtm
Copy link
Contributor

madsmtm commented Nov 10, 2024

on unclarified aarch64-apple-darwin environment(s) thread_local! panics on https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1277 .

What do you mean by "unclarified"? Knowing that might help us to reproduce the issue?

@dj8yfo
Copy link
Author

dj8yfo commented Nov 10, 2024

@madsmtm you might ask @peitalin from the linked issue about specifics of his environment, where he reproduced it. We didn't reproduce it on our macs either.

The command sequence to run the same scenario which triggered panic would be

# install cargo-near to $CARGO_HOME/bin
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/near/cargo-near/releases/latest/download/cargo-near-installer.sh | sh
cargo near new test
cd test/
cargo test

If the binary installer doesn't work for some reason, it can be installed from source by cargo install cargo-near

@dj8yfo
Copy link
Author

dj8yfo commented Nov 10, 2024

@madsmtm the code you tried is much simpler than the one from near-sdk repo. One from near-sdk repo uses generic parameter and also some unsafe to create references to self (without ouroborous). The self-referential part has been there for some odd 3 years, while the generic parameter has been added recently, a couple of months ago.

@madsmtm
Copy link
Contributor

madsmtm commented Nov 10, 2024

@peitalin:

  • Could you try the code in here?
  • Could you try with -Clinker=/usr/bin/ld (just to make sure that the problem isn't with a Clang installed via Homebrew or similar)?
  • What's your OS version (output of sw_vers)? Because then I might be able to whip up a VM for that version.

@madsmtm
Copy link
Contributor

madsmtm commented Nov 11, 2024

Linking possibly related https://rust-lang.zulipchat.com/#narrow/channel/122651-general/topic/Weird.20crash.20using.20thread.20locals.20Apple.20Silicon.20debug / zama-ai/tfhe-rs#1687.

So I'll also ask:

  • What version of Xcode are you using?
  • What version of ld are you using?

@peitalin
Copy link

peitalin commented Nov 11, 2024

@madsmtm I've just tried your example with RUSTFLAGS="-Clinker=/usr/bin/ld" cargo run and reproduced the errors.

My system specs are:

Xcode version: Version 15.2 (15C500b)

clang++ -v:

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

$ ld -v

@(#)PROGRAM:ld  PROJECT:dyld-1015.6
BUILD 20:37:42 Aug 14 2023
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
will use ld-classic for: armv6 armv7 armv7s arm64_32 i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 15.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 15.0.0 (tapi-1500.0.12.3)
Library search paths:
Framework search paths:

sw_vers

ProductName:		macOS
ProductVersion:		13.5.1
BuildVersion:		22G90

I've added a full backtrace of your code snippet and system specs here:
https://github.com/peitalin/madsmtm-demo

@madsmtm
Copy link
Contributor

madsmtm commented Nov 13, 2024

Nice! I reproduced it now (both in a VM on macOS 13.5, and on macOS 14.7), it's a problem with the linker in Xcode 15.0 and Xcode 15.1, which unfortunately happens to be the latest supported on macOS 13.5.

Snippet for reproducing

Copy this into foo.rs.

# Command Line Tools for Xcode 15.0/15.1
# The latest Command Line Tools installation overwrites this path
rustc foo.rs -Clinker=/Library/Developer/CommandLineTools/usr/bin/ld && ./foo

# Xcode 15.0
rustc foo.rs -Clinker=/Applications/Xcode\ 15.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld && ./foo

# Xcode 15.1
rustc foo.rs -Clinker=/Applications/Xcode\ 15.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld && ./foo

Things do seem to work in Rust 1.78 though, I'll run a bisection, and see if I can find the offending commit (to figure out if this is indeed a regression, and if there's anything we can do about it).

@RalfJung
Copy link
Member

RalfJung commented Nov 13, 2024

Rust recently gained extra assertions in debug builds to detect Undefined Behavior caused by misaligned pointers. That is likely why this started failing. Already on older versions of Rust, this code had Undefined Behavior, and could be the cause of subtle misbehavior -- newer Rust just makes this misbehavior easier to detect.

If the linker causes some pointer to be null/misaligned, that sounds like a critical bug in that linker. So it'd be good to know exactly which versions of the linker are affected.

@RalfJung
Copy link
Member

RalfJung commented Nov 13, 2024

Cc @saethlin, your checks found another bug. :)

Nice! I reproduced it now (both in a VM on macOS 13.5, and on macOS 14.7), it's a problem with the linker in Xcode 15.0 and Xcode 15.1, which unfortunately happens to be the latest supported on macOS 13.5.

Is this issue documented/tracked somewhere public? Are there newer linker versions available? I don't know anything about macOS, I am just surprised that an OS released just one year ago wouldn't be able to run the latest version of the linker.

EDIT: Actually https://developer.apple.com/documentation/xcode-release-notes/xcode-15_2-release-notes says that XCode 15.2 should work on macOS 13.5.

@madsmtm
Copy link
Contributor

madsmtm commented Nov 13, 2024

It takes quite a lot of time to test these things, Xcode is large ;)

But lemme write down exactly what I've tested so far:

  • WORKS: Command Line Tools for Xcode 14.3.1
  • UNKNOWN: Xcode 14.3.1
  • BROKEN: Command Line Tools for Xcode 15.0
  • UNKNOWN: Xcode 15.0
  • UNKNOWN: Xcode 15.0.1
  • BROKEN: Command Line Tools for Xcode 15.1
  • UNKNOWN: Xcode 15.1
  • UNKNOWN: Xcode 15.2
  • WORKS: Command Line Tools for Xcode 15.3
  • UNKNOWN: Xcode 15.3
  • WORKS: Xcode 15.4

And in between these there's a lot of betas and release candidates.

EDIT: Actually https://developer.apple.com/documentation/xcode-release-notes/xcode-15_2-release-notes says that XCode 15.2 should work on macOS 13.5.

Yeah, I should've clarified that I only really tested the Command Line Tools (and no Command Line Tools for Xcode 15.2 exist).

@madsmtm
Copy link
Contributor

madsmtm commented Nov 13, 2024

Is this issue documented/tracked somewhere public?

Not that I know of? Maybe rdar://110340167, though I don't have access to that myself, just guessing based on a similar issue found by googling?

Are there newer linker versions available?

With a newer Xcode, yes. Without, no (not practically, at least).

I don't know anything about macOS, I am just surprised that an OS released just one year ago wouldn't be able to run the latest version of the linker.

Xcode generally only wants to be run on the latest macOS release. Usually it works fine in older versions, but you have to edit a configuration file and run an "unsupported" build, so most people don't.

Also linking possibly related #90959.

@madsmtm
Copy link
Contributor

madsmtm commented Nov 13, 2024

You're probably right that this is / has been UB for a long time. A bisection led me to a6cdd81, which sounds very much like a red herring (the only thing of relevance that PR changed is a bit of panic code, which might explain why alignment would change between that and the previous commit).

EDIT: Yeah, definitely a red herring, I've later found previous commits that also exhibit the problem. So not really something that can be bisected.

I'll try to take a look at the difference in assembly between a version that succeeds and one that fails, but it's difficult, since it seems like it requires the standard library to be linked in a certain way (I couldn't trigger the issue with -Zbuild-std).

@saethlin
Copy link
Member

Yeah this looks exactly like what I previously poked at on Zulip, it's somewhat comforting that this seems to depend on the Apple Xcode bag of tools.

which sounds very much like a red herring

FWIW, the MIR inliner used to use item hash comparisons to prevent query cycles, which meant that literally any edit to a crate would cause the inliner to make different decisions. The only thing you can do in this scenario is turn off the MIR inliner with RUSTFLAGS=-Zinline-mir=no or RUSTFLAGS=-Zmir-enable-passes=-Inline. But of course the standard library in the range you're trying to bisect through has been precompiled with the unstable MIR optimization pipeline.

So you'd need to make -Zbuild-std work to get a usable bisection. By default, it builds the standard library with the optimization settings for the rest of the build. So if you don't set --release, you get a quite different standard library build. Are you just using cargo run -Zbuild-std without --release?

@madsmtm
Copy link
Contributor

madsmtm commented Nov 14, 2024

FWIW, the MIR inliner used to use item hash comparisons to prevent query cycles, which meant that literally any edit to a crate would cause the inliner to make different decisions. The only thing you can do in this scenario is turn off the MIR inliner with RUSTFLAGS=-Zinline-mir=no or RUSTFLAGS=-Zmir-enable-passes=-Inline. But of course the standard library in the range you're trying to bisect through has been precompiled with the unstable MIR optimization pipeline.

That explains a lot, thanks for the tips!

Are you just using cargo run -Zbuild-std without --release?

Yup, couldn't reproduce this issue at all with --release, but perhaps that's just 'cause I didn't try hard enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread-locals Area: Thread local storage (TLS) C-bug Category: This is a bug. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status O-AArch64 Armv8-A or later processors in AArch64 mode O-macos Operating system: macOS T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants