-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patching support for crate universe local vendoring #1902
Conversation
let mut srcs = Glob::new_rust_srcs().into(); | ||
let mut crate_root = crate_root; | ||
let mut compile_data = None; | ||
let local_patch = package.id.repr.contains("(path+file://"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly cargo metadata doesn't give us the information we want (the patch path) in an easy to access way. The source
field which should probably have it is just set to null
, and the id
field (which is supposed to be an opaque identifier) has something like <name> <version> (path+file://<location>)
. This PR uses that id
to determine whether a crate has been patched, and I'm fairly confident that if the cargo maintainers did actually change the format of the id
field then that would mean they have the bandwidth to fix the source
field so we'd still be able to get the info somehow.
Thanks for putting this together! Could you add a new example workspace to https://github.com/bazelbuild/rules_rust/tree/main/examples/crate_universe to illustrate what this concretely looks like in practice? It'd help guide reviewing :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions from me 😄
Also if there's some unit testing that could be added that'd be great too!
root.to_string() | ||
}; | ||
println!("There's a patch crate at '//{package}'."); | ||
println!("Make sure that the following is the contents of '//{package}/BUILD.bazel':"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with all the printing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each crate that's patched out, i'm currently just printing out the contents of the BUILD file that the user needs to place inside the actual crate so that the generated BUILD file in the vendor directory can refer to it. While testing it I'd manually copy those into the appropriate locations upon each run. Like I mentioned in the PR description there's a couple options here:
- cargo-bazel could write those files into the correct locations itself, which is maybe undesirable since it's a bit unexpected that it would be writing generated files to places other than the vendor directory
- it can check that the expected generated files exist at the correct location and print out the contents that should go there if they're not there. That way, upon adding a new patch you'd be notified upon re-vendoring that you need to add the autogenerated BUILD file, but it wouldn't be noise on every run like the current behavior.
- it can silently assume that the user knows what filegroups they need to expose from their patch crates. This relies on the user reading the examples/docs, and the error if they don't do it would be moderately helpful since it'll say something along the lines of "Can't find the filegroup
crate_root
in//your/patched/crates/foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this feels really weird, I would not expect the tool to generate anything for a crate that's patched with a path patch. If anything I would expect a BUILD file with an alias
. But needing to create filegroup
s feels confusing and brittle. Can we eliminate this requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I'm understanding: are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml
from the patch? Or do you mean that cargo-bazel should place the autogenerated BUILD files in the patch crate directly rather than putting them in the vendor dir? Doing the latter and then creating aliases from the vendor dir is feasible, it just seems like it'd be more surprising to a user that cargo-bazel
would be generating files outside the given vendor dir.
Re: brittleness, it's worth noting that the BUILD files for the patch crates are almost entirely static (with the exception of the location of your vendoring dir for exposing visibility, and nonstandard crate_root paths). On fuchsia where we've been testing with this change we have a single BUILD.template file with the following contents:
filegroup(
name = "crate_root",
srcs = ["src/lib.rs"],
visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)
filegroup(
name = "srcs",
srcs = glob(["**/*.rs"]),
visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)
filegroup(
name = "compile_data",
srcs = glob(
include = ["**"],
exclude = [
"**/* *",
"BUILD",
"BUILD.bazel",
"WORKSPACE",
"WORKSPACE.bazel",
],
),
visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)
filegroup(
name = "build_script_crate_root",
srcs = glob(["build.rs"]),
visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)
and then we create a symlink BUILD.bazel
to it in the root of each patch. That's a pretty low maintinance burden for the couple dozen crates we vendor forks of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml from the patch
This is what I'm saying. I would much prefer to have buildable rust targets in the vendored directories then to have filegroups consumed and built by something external. The pattern feels a bit unexpected. Instead of having users define filegroups, would it be possible to extend the Dependencies API to make defining Rust targets in the vendored BUILD files easier?
a7560bd
to
69d12af
Compare
e7d66ed
to
555a897
Compare
examples/crate_universe/vendor_local_patching/forked_rand_core/BUILD.bazel
Outdated
Show resolved
Hide resolved
@illicitonion I've added an example and docs, hopefully they give a better sense of how exactly the patching behavior works. I still need to add another patched crate to the example that exercises the build_script functionality, but it will look pretty much the same as the patched rand_core. @UebelAndre I think using the example as an integration test is the most valuable form of testing. Do you think it's worth adding a unit test which ensures that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@UebelAndre I think using the example as an integration test is the most valuable form of testing. Do you think it's worth adding a unit test which ensures that the
Rule
produced bycollect_targets
has filegroups rather than globs for its sources/crate_root/compile_data? That feels a bit too tied to the implementation details to me, and I don't think we can easily check in a unit test thatcargo vendor
doesn't actually download the sources for patched crates.
There's quite a bit of new functionality added here that definitely seems unit testable. As a maintainer I much prefer testing smaller interfaces to exercise contracts between them than to see an integration test. Integration tests are nice for a sort of smoke test but they either don't cover enough use cases, become incredibly complicated, or take forever to run. They're definitely valuable but I think unit tests are healthier for maintainability.
lock_pkg.name, | ||
lock_pkg.version | ||
), | ||
None => return Ok(SourceAnnotation::Path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no stronger guarantee to be made here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let local_patch = package.id.repr.contains("(path+file://");
comes to mind from get_attributes
root.to_string() | ||
}; | ||
println!("There's a patch crate at '//{package}'."); | ||
println!("Make sure that the following is the contents of '//{package}/BUILD.bazel':"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml from the patch
This is what I'm saying. I would much prefer to have buildable rust targets in the vendored directories then to have filegroups consumed and built by something external. The pattern feels a bit unexpected. Instead of having users define filegroups, would it be possible to extend the Dependencies API to make defining Rust targets in the vendored BUILD files easier?
@@ -155,6 +155,7 @@ pub enum SourceAnnotation { | |||
#[serde(default, skip_serializing_if = "Option::is_none")] | |||
patches: Option<BTreeSet<String>>, | |||
}, | |||
Path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use this to store relevant info, like the package name or something that's calculated in get_attributes
?
It looks like we have the info we need to tease out the patched crates' paths without having to resort to completely parsing the ID field (which is technically opaque). We can do a half measure and look for `(path+file://` and then use other metadata in addition to knowledge of the tempdir path length to form the correct crate roots, and then make the globs/source_roots point to those which will hopefully be enough to get them to build. So far this change just patches up the source roots, and they're not even correctly normalized as relative paths yet so it definitely won't build yet. we may need to actually generate the build files in the correct locations with an alias or something to make defs.bzl in the vendor dir still point to it.
Co-authored-by: Dan Johnson <[email protected]>
@UebelAndre sorry to revive this old PR but Fuchsia is trying again to start using bazel for rust and not being able to patch deps is still blocking us. I've rebased the changes, and the Last time this comment was the blocking concern. On Fuchsia we use I think we agreed last time around that it'd be unexpected/not advisable to have Let me know if there have been any big changes that impact this approach, or if you'd rather me just close this and make a fresh PR. |
We've opted to clean this up and re-open it as a separate PR: #3071. Many of the original comments were stale after lots of rebasing, and the main point of discussion has been linked to and summarized from that new PR. |
On Fuchsia we have an existing Cargo.toml that we use for cargo-vendor with a couple hundred crates. For dozens of those crates we patch out the dependencies to point to forks elsewhere in our tree:
We're trying move to using crate universe to manage the vendoring, and #1645 (which this PR is rebased ontop of) is enough to make cargo-bazel not choke when it sees those path dependencies. At that point it emits a spliced
Cargo.toml
which still contains thepatch.crates-io
and hands it off tocargo vendor
, and that correctly doesn't download any of the crates that are listed as patched.Unfortunately BUILD file generation was still broken. In this example with 3 crates where
b
is patched out, cargo-bazel still creates avendor/b/BUILD.bazel
that contains a sources glob for**/*.rs
but the sources are over in the patch location:To fix the issue of sources (as well as crate root and compile data) pointing to nonsensical paths for the generated BUILD file, we tried to make the minimally intrusive change of letting those fields accept either a label or glob (rather than hardcoding the glob) and then creating filegroups in the patched crate that can be pointed to by their label from the generated file. Currently the code prints out a generated BUILD file for each patched crate, but the only difference for different patched crates would be if they had different source roots, so we'd be happy to simply document the filegroups that are required for patching to work and let the user handle copying a template into each of their patches. It would be possible for cargo-bazel to write out those BUILD files to the right locations itself, but we assumed this behavior was a bit too invasive and should at least not be enabled by default. Here's an example of the generated BUILD file:
@illicitonion and/or @UebelAndre I'd love your thoughts on the approach here. An alternative we were exploring was to create a third vendoring "mode" called "local_repository" which still ran cargo vendor like the local mode, but defined each crate as its own repository like the remote mode. Then we could achieve patching by overriding all those
maybe
repositories withnew_local_repository
for each patched crate in our WORKSPACE.bazel. That required more extensive changes to cargo-bazel internals compared to this PR and we'd like to make patching work for existing users of the local mode without adding a different mode. Patching support is our last blocker to landing rules_rust usage in Fuchsia, so we wanted to make this change as uncontroversial as possible.Before I'd consider this PR mergable, we should remove the
println!
s for patched crates and add a section about patching to the crate universe docs. Ideally we should add some testing but I'm not sure whether it's feasible to write e2e tests that fetch some deps from crates io and use some local patches and make sure that a crate that depends on both of those still compiles. That's what I've been doing locally to test this change.Fixes #1631