Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to rustc 1.70 #9140

Merged
merged 14 commits into from
Jun 19, 2023
Merged

Update to rustc 1.70 #9140

merged 14 commits into from
Jun 19, 2023

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Jun 1, 2023

With 1.70 toolchain, the contracts are by default compiled in a way
which requires updates to wasmer 2.4.1 as well.

I looked also into replacing usage of once_cell crate but sadly
std::sync::OnceLock doesn’t implement wait method.

Co-authored-by: Jakob Meier [email protected]
Co-authored-by: Simonas Kazlauskas [email protected]
Issue: near/create-near-app#2009
Issue: #9143

@mina86 mina86 requested a review from a team as a code owner June 1, 2023 22:34
@mina86 mina86 requested a review from Longarithm June 1, 2023 22:34
@Longarithm
Copy link
Member

How can rust upgrade lead to changes in gas costs? @jakmeier https://buildkite.com/nearprotocol/nearcore/builds/28207#01887919-bfe1-418c-814e-ebef03e675be

@jakmeier
Copy link
Contributor

jakmeier commented Jun 2, 2023

Hah, that's an interesting one! I think we are in UB territory. At least this error message is familiar:

FunctionCallError(CompilationError(PrepareError(Deserialization)))

https://buildkite.com/nearprotocol/nearcore/builds/28207#01887919-bfde-4950-bfe4-07ecc5acae43/93-1208

It makes me think we are running into the problem again that our contract serialization format is rust version dependent. (see #9025)
But that seems can't quite be the explanation, since we are not mixing multiple rust versions inside CI.

cc @nagisa maybe you have some immediate thoughts/ideas?

@nagisa
Copy link
Collaborator

nagisa commented Jun 2, 2023

I won't be able to look at this before Monday, but what I remember is that deserialization issues in rkyv were segfaults, rather than regular errors. Though, I guess once you violate soundness, anything could happen.

@nagisa
Copy link
Collaborator

nagisa commented Jun 5, 2023

Looks like the new LLVM with the 1.70 has started producing the instructions from the sign extension and other now-mainline WebAssembly proposals by default. This and these other extensions are not supported by the pwasm parser.

Unfortunately the relevant release notes are absent. But this appears to be the relevant differential.

The proposed way forward is for us to use -Ctarget-cpu=mvp rustc flag for WebAssembly targets until we've a protocol change to implement and enable some of these features.

Note that in particular nearcore is not the only affected party – any contract developer who upgrades to Rust 1.70 will be affected unless they utilize the workaround. It might be worthwhile to communicate this more broadly somehow. cc @akhi3030 @bowenwang1996

One way to do so would be to create and pin a github issue, but there may be other channels too.

@nagisa
Copy link
Collaborator

nagisa commented Jun 5, 2023

Note that this above can also explain the gas cost changes too – LLVM upgrades are prone to affecting the resulting test WebAssembly contracts’ code sufficiently that they execute fewer or more instructions. To avoid this we should make sure to store our test contracts as WebAssembly if we depend on precise execution outcome when executing those contracts.

@akhi3030
Copy link
Collaborator

akhi3030 commented Jun 5, 2023

Note that in particular nearcore is not the only affected party – any contract developer who upgrades to Rust 1.70 will be affected unless they utilize the workaround. It might be worthwhile to communicate this more broadly somehow. cc @akhi3030 @bowenwang1996

Thanks, I'll look into this. Do we know how things will break for the contract developers? Will they get an error when trying to compile / deploy on localnet / deploy on mainnet / etc. Or will the problem be more subtle?

@jakmeier
Copy link
Contributor

jakmeier commented Jun 5, 2023

This and these other extensions are not supported by the pwasm parser.

So, is this only a limitation of pwasm parser then? I would image native code generation also doesn't know to handle it, if the parsing stage couldn't output these instructions, yet. Is this the case? Or is it possible we only need a pwasm parser update and would be ready to support these new WASM features?

@nagisa
Copy link
Collaborator

nagisa commented Jun 5, 2023

We would at very least require validation of these extensions, but the code is in place for these instructions in NearVM.

@nagisa
Copy link
Collaborator

nagisa commented Jun 5, 2023

Thanks, I'll look into this. Do we know how things will break for the contract developers? Will they get an error when trying to compile / deploy on localnet / deploy on mainnet / etc. Or will the problem be more subtle?

They will get deployment failures any time they compile their contracts with 1.70+ and the contract is complex enough to convince LLVM to use the new instructions. Given our anemic diagnostics they will definitely have a hard time figuring out why the deployment fails though :)

@nagisa

This comment was marked as outdated.

@nagisa

This comment was marked as outdated.

@nagisa
Copy link
Collaborator

nagisa commented Jun 5, 2023

Using instructions in #9143, the following patch specifically would get most of the failing tests to pass:

diff --git a/runtime/near-test-contracts/build.rs b/runtime/near-test-contracts/build.rs
index 1ec245566..8035897a5 100644
--- a/runtime/near-test-contracts/build.rs
+++ b/runtime/near-test-contracts/build.rs
@@ -53,10 +53,11 @@ fn cargo_build_cmd(target_dir: &Path) -> Command {
     res.env_remove("CARGO_BUILD_RUSTFLAGS");
     res.env_remove("CARGO_ENCODED_RUSTFLAGS");
 
-    res.env("RUSTFLAGS", "-Dwarnings");
+    res.env("RUSTFLAGS", "-Dwarnings -Ctarget-cpu=mvp");
     res.env("CARGO_TARGET_DIR", target_dir);
+    res.env("RUSTC_BOOTSTRAP", "1");
 
-    res.args(["build", "--target=wasm32-unknown-unknown", "--release"]);
+    res.args(["build", "--target=wasm32-unknown-unknown", "--release", "-Zbuild-std=panic_abort,std"]);
 
     res
 }

That said, I don't believe it is something we necessarily want to do due to this requiring RUSTC_BOOSTRAP hack to enable use of the unstable rust & cargo features (-Zbuild-std) on stable compilers.

@jakmeier
Copy link
Contributor

So we have a couple of remaining problems.

  • Some tests use a WASM and deploy it on a old version, which still uses wasmer2 and hence can't handle the new ops generated by rsutc 1.70. I think for those tests, we should use a fixed WASM input rather than build it from source using the nearcore toolchain. I've submitted a PR for that: test: use WASM for backwards-compatibility tests #9173, please @nagisa take a look and let me know if that approach makes sense to you.
  • Some tests should use base_rs_contract rather than rs_contract, I'll submit another PR for those later.
  • Some tests (*compute_cost*) are designed poorly and are not able to handle expected gas cost chagnes. I'll look into a fix for those now and also try to send a PR today.
  • There might be more failing tests, as I haven't gone through the full list, yet.

near-bulldozer bot pushed a commit that referenced this pull request Jun 12, 2023
`base_test_contract_rs.wasm` is a WASM used in our tests that is
supposed to be compatible with the oldest version.
That currently requires rustc < 1.70 or some hacks to
use unstable cargo features. (`-Zbuild-std`)

The easiest way forward is to check in a WASM built with the correct
settings instead or relying on the nearcore toolchain version.

This is a pre-requisite to land the 1.70 toolchain upgrade. (#9140)
jakmeier added a commit to jakmeier/nearcore that referenced this pull request Jun 12, 2023
`rs_contract` can be used by tests that run only on  the latest
 protocol version, as defined by the const PROTOCOL_VERSION.

`base_rs_contract` must be used in backwards compatibility tests that
 rely on specific protocol versions.

This is a pre-requisite to land the rustc 1.70 toolchain upgrade.
(See also near#9140)
@jakmeier
Copy link
Contributor

The gas changes were actually due to receipts failing rather than executing, which meant they used less gas. Updating to use a compatible WASM fixes that for free.

That means, maybe we only need one more fix, which is to use base_rs_contract in all backward-compatibility tests. Which is already done in #9175. Once it's on master, we can rebase this PR here and see if there are still failure left.

near-bulldozer bot pushed a commit that referenced this pull request Jun 12, 2023
`rs_contract` can be used by tests that run only on  the latest
 protocol version, as defined by the const PROTOCOL_VERSION.

`backwards_compatible_rs_contract` must be used in backwards 
compatibility tests that rely on specific protocol versions.

This is a pre-requisite to land the rustc 1.70 toolchain upgrade.
(See also #9140)
@jakmeier
Copy link
Contributor

jakmeier commented Jun 12, 2023

The remaining issues seem to be of a single cause, which is pointer memory alignment issue.
It is triggered in several tests, such as

  • tests::regression_tests::memory_size_alignment_issue
  • tests::rs_contract::attach_unspent_gas_but_burn_all_gas,
  • near-vm-runner tests::runtime_errors::test_memory_grow,
  • tests::runtime_errors::test_panic_re_export
  • ...

At a first glance, it seems like it's only an issue for wasmer2, not for near-vm. But I'm not quite sure yet what is going on. @nagisa / @Ekleog-NEAR do you know why this is failing with the rustc upgrade?

error message for reference:

cargo nextest run --workspace --features nightly,test_features,mock_node -p '*' --exclude 'integration-tests*' -- memory_size_alignment_issue
    Starting 1 tests across 91 binaries (1397 skipped)
     SIGABRT [   0.101s]                                                        near-vm-runner tests::regression_tests::memory_size_alignment_issue

--- STDOUT:                                                                     near-vm-runner tests::regression_tests::memory_size_alignment_issue ---

running 1 test

--- STDERR:                                                                     near-vm-runner tests::regression_tests::memory_size_alignment_issue ---
thread 'tests::regression_tests::memory_size_alignment_issue' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x7fac7002ecac', /home/jakmeier/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-near-2.4.0/src/instance/mod.rs:230:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

@Ekleog-NEAR
Copy link
Collaborator

Seeing the line on which this fails, it looks like rustc is now enabling ASAN by default.

We fixed the same issue on near-vm when the fuzzer noticed it in #9063. We did not fix it for wasmer because of the hassle it would be to make a new wasmer release compared to the fact that we don’t support platforms that require alignment anyway, so the issue is moot without ASAN enabled.

Given it looks like this is becoming an actual problem, I’ve backported #9063 to wasmer#156, but AFAICT only @nagisa has the access rights needed to make a new release there. Either way, nothing scary for what we already landed, it just means that a previously-benign issue has now become important, and we’ll need to make one change to wasmer before landing this

@akhi3030 akhi3030 changed the title Update to latest rustc Update to rustc 1.70 Jun 13, 2023
@nagisa
Copy link
Collaborator

nagisa commented Jun 14, 2023

Seeing the line on which this fails, it looks like rustc is now enabling ASAN by default.

This can’t be true! Enabling ASAN by default has non-trivial implications, not to mention that the failure mode between ASAN (abort) and what’s seen here (a panic) are different.

Instead this looks to be rust-lang/rust#98112.

@nagisa
Copy link
Collaborator

nagisa commented Jun 14, 2023

Wasmer 2.4.1 has been released.

diff --git a/Cargo.toml b/Cargo.toml
index fe319b01f..bdbc672bc 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -316,14 +316,14 @@ trybuild = "1.0.11"
 turn = "0.6"
 validator = "0.12"
 wasm-encoder = "0.27.0"
-wasmer-compiler = { package = "wasmer-compiler-near", version = "=2.4.0" }
-wasmer-compiler-singlepass = { package = "wasmer-compiler-singlepass-near", version = "=2.4.0" }
-wasmer-engine = { package = "wasmer-engine-near", version = "=2.4.0" }
-wasmer-engine-universal = { package = "wasmer-engine-universal-near", version = "=2.4.0", features = ["compiler"] }
+wasmer-compiler = { package = "wasmer-compiler-near", version = "=2.4.1" }
+wasmer-compiler-singlepass = { package = "wasmer-compiler-singlepass-near", version = "=2.4.1" }
+wasmer-engine = { package = "wasmer-engine-near", version = "=2.4.1" }
+wasmer-engine-universal = { package = "wasmer-engine-universal-near", version = "=2.4.1", features = ["compiler"] }
 wasmer-runtime = { version = "0.18.0", package = "wasmer-runtime-near", features = ["default-backend-singlepass"], default-features = false }
 wasmer-runtime-core = { version = "0.18.2", package = "wasmer-runtime-core-near" }
-wasmer-types = { package = "wasmer-types-near", version = "=2.4.0" }
-wasmer-vm = { package = "wasmer-vm-near", version = "=2.4.0" }
+wasmer-types = { package = "wasmer-types-near", version = "=2.4.1" }
+wasmer-vm = { package = "wasmer-vm-near", version = "=2.4.1" }
 wasmparser = "0.78" # TODO: unify at least the versions of wasmparser we have in our codebase
 wasmprinter = "0.2"
 wasm-smith = "0.10"

plus a cargo check to update the lockfile should be all this needs.

(I would've made a suggestion that you can apply with a single click, but github does not allow commenting on that part of Cargo.toml for some dogforsaken reason)

@mina86
Copy link
Contributor Author

mina86 commented Jun 14, 2023

Looks better and better. test_wasmer2_artifact_output_stability is the only test failing right now. I don’t recall exactly the purpose of this test and at the moment I’m not sure if the expected can just be updated or not.

@mina86
Copy link
Contributor Author

mina86 commented Jun 14, 2023

So to be honest I wasn’t really paying attention to what you guys were doing with this change, but from what I’m reading, does support for new instruction needs to be gated on protocol version update?

@nagisa
Copy link
Collaborator

nagisa commented Jun 14, 2023

does support for new instruction needs to be gated on protocol version update?

We rolled support for it into the PreparationV2 protocol feature, so yeah, it is gated.


test_wasmer2_artifact_output_stability is the only test failing right now. I don’t recall exactly the purpose of this test and at the moment I’m not sure if the expected can just be updated or not.

The intent for this test is to detect changes to the cached executable shape, and to remind to increment the version here:

seed: (1 << 10) | (10 << 6) | 0,

so that this cache is flushed. It is fine to update the expected hashes as long as you update that seed to something like seed: (1 << 10) | (11 << 6) | 0,.

@Ekleog-NEAR
Copy link
Collaborator

So to be honest I wasn’t really paying attention to what you guys were doing with this change, but from what I’m reading, does support for new instruction needs to be gated on protocol version update?

This will naturally happen as part of 1.35, so there’s nothing to do here :)

Looks better and better. test_wasmer2_artifact_output_stability is the only test failing right now. I don’t recall exactly the purpose of this test and at the moment I’m not sure if the expected can just be updated or not.

Just to add some context to what @nagisa said, this test has as a goal to validate that we don’t change wasmer while forgetting to clear the compiled contract cache. The upgrade to 2.4.1, needed to make wasmer2 compile-able with rust 1.70, changes codegen and thus requires a clear of the compiled contract cache. Hence the need to update the seed that @nagisa linked, and once that’s done there’s no issue with updating the numbers listed in test_wasmer2_artifact_output_stability :)

@mina86
Copy link
Contributor Author

mina86 commented Jun 16, 2023

We rolled support for it into the PreparationV2 protocol feature, so yeah, it is gated.

OK, so long as you guys know what you’re doing. ;) My worry was to make sure that contract compiled with 1.70 will be rejected on old blocks even if validator runs node supporting the new instruction. If it all is set up correctly, this PR is ready to land.

Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR is:

  • compiling contracts with rustc 1.70 leads to new instructions being used
  • compiling nearcore with rustc 1.70 has no impact on new instructions being used
  • the protocol upgrade from 1.35 allows nearcore compiled with any rustc to execute the instructions needed by the contracts compiled with rustc 1.70

Hopefully things make more sense now! :)

As for this PR, LGTM from the contract runtime side, but I’ll let people from core speak about in particular the removal of clone()s in chain/client/src/tests/catching_up.rs

@mina86
Copy link
Contributor Author

mina86 commented Jun 18, 2023

As for this PR, LGTM from the contract runtime side, but I’ll let people from core speak about in particular the removal of clone()s in chain/client/src/tests/catching_up.rs

That was linter complaining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants