Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust_analyzer: don't build a tree of RustAnalyzerInfos #3028

Merged
merged 8 commits into from
Dec 5, 2024

Conversation

sam-mccall
Copy link
Contributor

@sam-mccall sam-mccall commented Nov 28, 2024

This patch adds an id to RustAnalyzerInfo, and replace all the recursive RustAnalyzerInfos with the respective crate ids.
This fixes RustAnalyzerInfos becoming exponentially expensive to build in the presence of aliases.


Before this patch, RustAnalyzerInfo.deps contains the RustAnalyzerInfo of the target crate's deps, and so on recursively.
This forms a graph of Infos where there is one Info per target, but multiple paths from one target to another.
e.g. these could all point to bar: foo.deps[0] == foo.deps[1].deps[0] == foo.aliases[0].

If we walk foo as a tree, we will see bar multiple times. Two operations that do this are print(info) and {info: 42} which hashes the object. As the graph grows, the number of paths grows combinatorically.

RustAnalyzerInfo.aliases is defined as a dict from RustAnalyzerInfo => string, so building this triggers the slowdown.

It would be possible to fix this by e.g. replacing this dict with a list of pairs.
However the work rust_analyzer_aspect does is fundamentally local and does not need a recursive data structure. Eliminating it means we can freely use these values as keys, print them, etc.


Timings for ra_ap_rust-analyzer (which heavily aliases its deps):

blaze build //third_party/bazel_rules/rules_rust/tools/rust_analyzer:gen_rust_project; \
time blaze run //third_party/bazel_rules/rules_rust/tools/rust_analyzer:gen_rust_project -- //third_party/rust/ra_ap_rust_analyzer/...

===Before===
Executed in  211.06 secs    fish           external
   usr time    0.24 secs    0.00 micros    0.24 secs
   sys time    1.24 secs  604.00 micros    1.24 sec

===After===
Executed in    3.24 secs    fish           external
   usr time    0.15 secs  389.00 micros    0.15 secs
   sys time    1.18 secs  125.00 micros    1.18 secs

Before this patch, `RustAnalyzerInfo.deps` contains the RustAnalyzerInfo
of the target crate's deps, and so on recursively.
So for each target we build a tree capturing all transitive deps.

---

In fact it's worse when aliases are present, as they're stored there too.
We construct `RustAnalyzerInfo(aliases{//foo => x}, deps = [x])`
with a single shared `x` object:

    RustAnalyzerInfo-----deps-->//foo[RustAnalyzerInfo]
                    \_aliases-/

But when we return it from the aspect, blaze deep-copies it, resulting
in two copies of //foo's data:

    //bar[RustAnalyzerInfo]----deps-->//foo[RustAnalyzerInfo]
                          \_aliases-->//foo[RustAnalyzerInfo]

With multiple layers of dependencies with aliases, this blows up
exponentially, creating huge data structures that lead to very slow
generation times.

---

All of this is unneccesary: each crate only needs to know its own ID and
details, and the IDs of the dependencies it needs to refer to.
The per-crate JSON fragment describes one node in a graph, and
concatenating them completes the graph.

So we add an `id` to RustAnalyzerInfo, and replace all the recursive
RustAnalyzerInfos with the respective crate `id`s.

---

Timings for `ra_ap_rust-analyzer` (which heavily aliases its deps):

```
blaze build //third_party/bazel_rules/rules_rust/tools/rust_analyzer:gen_rust_project; \
time blaze run //third_party/bazel_rules/rules_rust/tools/rust_analyzer:gen_rust_project -- //third_party/rust/ra_ap_rust_analyzer/...

===Before===
Executed in  211.06 secs    fish           external
   usr time    0.24 secs    0.00 micros    0.24 secs
   sys time    1.24 secs  604.00 micros    1.24 sec

===After===
Executed in    3.24 secs    fish           external
   usr time    0.15 secs  389.00 micros    0.15 secs
   sys time    1.18 secs  125.00 micros    1.18 secs
```
sam-mccall added a commit to sam-mccall/rules_rust that referenced this pull request Dec 1, 2024
The existing support for this has limitations:

- only works for sources that are specially tagged (rust_generated_srcs).
  I've removed this mechanism as it's no longer needed.
- only provides sources of the directly selected library, but
  rust-analyzer needs to parse its dependencies too.
- is missing the target.json file, which rust-analyzer requires to
  parse the library (it invokes rustc --print cfg --target=...)

The dependency depth to generate sources for is a tradeoff. More layers
of deps will give more accurate analysis but be slower to generate.

My guess is that in the end we'll want full transitive sources, but
direct deps should be a non-controversial starting point. (Without these
sources, rust-analyzer can't understand the direct API usage at all).
We can change this later and probably need to reconcile with bazelbuild#3028.
@sam-mccall
Copy link
Contributor Author

I misunderstood the cause of the performance problems here. The data aren't actually duplicated, it just appears that way if you (a) print it, or (b) hash it. I was misled by (a), and the rules use these infos as hash keys for aliases, triggering (b).

I do think eliminating the tree of RustAnalyzerInfos is a good idea if we don't need the power: it's harder to reason about the available data, carries risks of these performance cliffs, prevents print-debugging.
So I'd lean towards just rewriting the commit message, but if we want a less-invasive change, we could just fix how aliases are computed.

@krasimirgg krasimirgg self-requested a review December 3, 2024 10:42
@UebelAndre
Copy link
Collaborator

How does this impact RustAnalyzerInfo providers behind a transition? Take #3035 for example, the wasm_file attribute of rust_wasm_bindgen may have target_compatible_with = ["@platforms//os:wasm32"] set on it which would make it skipped in normal builds outside of that transition. If trees are not built, will the dependency graph of the wasm target be lost?

@sam-mccall
Copy link
Contributor Author

The Info will still be built for each target the aspect propagates to, and the json specs will still be gathered in the depset in the same way. The only thing that changes is the data structure used is not recursive.

I don't know specifically how transitions affect the aspect's behavior in that example, but I'm fairly sure it's the same before/after the patch.

extensions/prost/private/prost.bzl Show resolved Hide resolved
rust/private/rust_analyzer.bzl Outdated Show resolved Hide resolved
Copy link
Collaborator

@krasimirgg krasimirgg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@krasimirgg krasimirgg added this pull request to the merge queue Dec 5, 2024
Merged via the queue into bazelbuild:main with commit 443b089 Dec 5, 2024
4 checks passed
UebelAndre added a commit to UebelAndre/rules_rust that referenced this pull request Dec 12, 2024
@UebelAndre
Copy link
Collaborator

UebelAndre commented Dec 12, 2024

This change seems to have broken the ability for rust-analyzer to index into prost rules. None of the proto targets are being indexed.

~/Code/rules_rust/extensions/prost: bazel run @rules_rust//tools/rust_analyzer:gen_rust_project

With this change: rust-project-main.json
Revert: rust-project-revert.json

@sam-mccall @krasimirgg would you be able to take a look?

@sam-mccall
Copy link
Contributor Author

@UebelAndre sorry, I wasn't able to get to this until now - I've been out sick.

I'll take a look: at first glance all the crates are there, it looks like some deps from *_test targets => protos are missing. Not sure why.

Do you know why Prost doesn't return a regular CrateInfo in the first place? rust_proto_library does this and seems to "just work" with rust-analyzer, without the rules having to know anything about RustAnalyzerInfo.

@UebelAndre
Copy link
Collaborator

@UebelAndre sorry, I wasn't able to get to this until now - I've been out sick.

No worries, glad you're feeling better!

Do you know why Prost doesn't return a regular CrateInfo in the first place? rust_proto_library does this and seems to "just work" with rust-analyzer, without the rules having to know anything about RustAnalyzerInfo.

Prost returns CrateGroupInfo since unlike the rust_proto_library rule, the crate is compiled on an action for the proto_library target via an aspect and can be representative of transitive crates as well. Thus the aspect needed to do the heavy lifting for rust-analyzer. Or at least it was thought to be needed once upon a time. If CrateGroupInfo is sufficient for populating RustAnalyzerInfo then that'd certainly simplify the implementation 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants