Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patching support for crate universe local vendoring #1902

Closed
wants to merge 5 commits into from

Conversation

P1n3appl3
Copy link
Contributor

On Fuchsia we have an existing Cargo.toml that we use for cargo-vendor with a couple hundred crates. For dozens of those crates we patch out the dependencies to point to forks elsewhere in our tree:

[patch.crates-io]
fuse3 = { path = "forks/fuse3" }

We're trying move to using crate universe to manage the vendoring, and #1645 (which this PR is rebased ontop of) is enough to make cargo-bazel not choke when it sees those path dependencies. At that point it emits a spliced Cargo.toml which still contains the patch.crates-io and hands it off to cargo vendor, and that correctly doesn't download any of the crates that are listed as patched.

Unfortunately BUILD file generation was still broken. In this example with 3 crates where b is patched out, cargo-bazel still creates a vendor/b/BUILD.bazel that contains a sources glob for **/*.rs but the sources are over in the patch location:

.
├── BUILD.bazel
├── Cargo.toml
├── my_forks
│  └── b
│     ├── Cargo.toml
│     └── src
└── vendor
   ├── a
   │  ├── BUILD.bazel
   │  ├── Cargo.toml
   │  └── src
   ├── b
   │  └── BUILD.bazel
   └── c
      ├── BUILD.bazel
      ├── Cargo.toml
      └── src

To fix the issue of sources (as well as crate root and compile data) pointing to nonsensical paths for the generated BUILD file, we tried to make the minimally intrusive change of letting those fields accept either a label or glob (rather than hardcoding the glob) and then creating filegroups in the patched crate that can be pointed to by their label from the generated file. Currently the code prints out a generated BUILD file for each patched crate, but the only difference for different patched crates would be if they had different source roots, so we'd be happy to simply document the filegroups that are required for patching to work and let the user handle copying a template into each of their patches. It would be possible for cargo-bazel to write out those BUILD files to the right locations itself, but we assumed this behavior was a bit too invasive and should at least not be enabled by default. Here's an example of the generated BUILD file:

filegroup(
    name = "crate_root",
    srcs = ["src/lib.rs"],
    visibility = ["//third_party/vendor:__subpackages__"],
)

filegroup(
    name = "srcs",
    srcs = glob(["**/*.rs"]),
    visibility = ["//third_party/vendor:__subpackages__"],
)

filegroup(
    name = "compile_data",
    srcs = glob(
        include = ["**"],
        exclude = [
            "**/* *",
            "BUILD",
            "BUILD.bazel",
            "WORKSPACE",
            "WORKSPACE.bazel",
        ],
    ),
    visibility = ["//third_party/vendor:__subpackages__"],
)

@illicitonion and/or @UebelAndre I'd love your thoughts on the approach here. An alternative we were exploring was to create a third vendoring "mode" called "local_repository" which still ran cargo vendor like the local mode, but defined each crate as its own repository like the remote mode. Then we could achieve patching by overriding all those maybe repositories with new_local_repository for each patched crate in our WORKSPACE.bazel. That required more extensive changes to cargo-bazel internals compared to this PR and we'd like to make patching work for existing users of the local mode without adding a different mode. Patching support is our last blocker to landing rules_rust usage in Fuchsia, so we wanted to make this change as uncontroversial as possible.

Before I'd consider this PR mergable, we should remove the println!s for patched crates and add a section about patching to the crate universe docs. Ideally we should add some testing but I'm not sure whether it's feasible to write e2e tests that fetch some deps from crates io and use some local patches and make sure that a crate that depends on both of those still compiles. That's what I've been doing locally to test this change.

Fixes #1631

let mut srcs = Glob::new_rust_srcs().into();
let mut crate_root = crate_root;
let mut compile_data = None;
let local_patch = package.id.repr.contains("(path+file://");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly cargo metadata doesn't give us the information we want (the patch path) in an easy to access way. The source field which should probably have it is just set to null, and the id field (which is supposed to be an opaque identifier) has something like <name> <version> (path+file://<location>). This PR uses that id to determine whether a crate has been patched, and I'm fairly confident that if the cargo maintainers did actually change the format of the id field then that would mean they have the bandwidth to fix the source field so we'd still be able to get the info somehow.

@illicitonion
Copy link
Collaborator

Thanks for putting this together! Could you add a new example workspace to https://github.com/bazelbuild/rules_rust/tree/main/examples/crate_universe to illustrate what this concretely looks like in practice? It'd help guide reviewing :)

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions from me 😄

Also if there's some unit testing that could be added that'd be great too!

root.to_string()
};
println!("There's a patch crate at '//{package}'.");
println!("Make sure that the following is the contents of '//{package}/BUILD.bazel':");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with all the printing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each crate that's patched out, i'm currently just printing out the contents of the BUILD file that the user needs to place inside the actual crate so that the generated BUILD file in the vendor directory can refer to it. While testing it I'd manually copy those into the appropriate locations upon each run. Like I mentioned in the PR description there's a couple options here:

  • cargo-bazel could write those files into the correct locations itself, which is maybe undesirable since it's a bit unexpected that it would be writing generated files to places other than the vendor directory
  • it can check that the expected generated files exist at the correct location and print out the contents that should go there if they're not there. That way, upon adding a new patch you'd be notified upon re-vendoring that you need to add the autogenerated BUILD file, but it wouldn't be noise on every run like the current behavior.
  • it can silently assume that the user knows what filegroups they need to expose from their patch crates. This relies on the user reading the examples/docs, and the error if they don't do it would be moderately helpful since it'll say something along the lines of "Can't find the filegroup crate_root in //your/patched/crates/foo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this feels really weird, I would not expect the tool to generate anything for a crate that's patched with a path patch. If anything I would expect a BUILD file with an alias. But needing to create filegroups feels confusing and brittle. Can we eliminate this requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I'm understanding: are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml from the patch? Or do you mean that cargo-bazel should place the autogenerated BUILD files in the patch crate directly rather than putting them in the vendor dir? Doing the latter and then creating aliases from the vendor dir is feasible, it just seems like it'd be more surprising to a user that cargo-bazel would be generating files outside the given vendor dir.

Re: brittleness, it's worth noting that the BUILD files for the patch crates are almost entirely static (with the exception of the location of your vendoring dir for exposing visibility, and nonstandard crate_root paths). On fuchsia where we've been testing with this change we have a single BUILD.template file with the following contents:

filegroup(
    name = "crate_root",
    srcs = ["src/lib.rs"],
    visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)

filegroup(
    name = "srcs",
    srcs = glob(["**/*.rs"]),
    visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)

filegroup(
    name = "compile_data",
    srcs = glob(
        include = ["**"],
        exclude = [
            "**/* *",
            "BUILD",
            "BUILD.bazel",
            "WORKSPACE",
            "WORKSPACE.bazel",
        ],
    ),
    visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)

filegroup(
    name = "build_script_crate_root",
    srcs = glob(["build.rs"]),
    visibility = ["//third_party/rust_crates/vendor:__subpackages__"],
)

and then we create a symlink BUILD.bazel to it in the root of each patch. That's a pretty low maintinance burden for the couple dozen crates we vendor forks of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml from the patch

This is what I'm saying. I would much prefer to have buildable rust targets in the vendored directories then to have filegroups consumed and built by something external. The pattern feels a bit unexpected. Instead of having users define filegroups, would it be possible to extend the Dependencies API to make defining Rust targets in the vendored BUILD files easier?

crate_universe/src/utils/starlark/glob.rs Show resolved Hide resolved
crate_universe/src/metadata/metadata_annotation.rs Outdated Show resolved Hide resolved
@P1n3appl3 P1n3appl3 force-pushed the patching branch 3 times, most recently from a7560bd to 69d12af Compare March 28, 2023 22:25
@P1n3appl3 P1n3appl3 force-pushed the patching branch 2 times, most recently from e7d66ed to 555a897 Compare March 31, 2023 01:14
@P1n3appl3
Copy link
Contributor Author

P1n3appl3 commented Mar 31, 2023

@illicitonion I've added an example and docs, hopefully they give a better sense of how exactly the patching behavior works. I still need to add another patched crate to the example that exercises the build_script functionality, but it will look pretty much the same as the patched rand_core.

@UebelAndre I think using the example as an integration test is the most valuable form of testing. Do you think it's worth adding a unit test which ensures that the Rule produced by collect_targets has filegroups rather than globs for its sources/crate_root/compile_data? That feels a bit too tied to the implementation details to me, and I don't think we can easily check in a unit test that cargo vendor doesn't actually download the sources for patched crates.

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UebelAndre I think using the example as an integration test is the most valuable form of testing. Do you think it's worth adding a unit test which ensures that the Rule produced by collect_targets has filegroups rather than globs for its sources/crate_root/compile_data? That feels a bit too tied to the implementation details to me, and I don't think we can easily check in a unit test that cargo vendor doesn't actually download the sources for patched crates.

There's quite a bit of new functionality added here that definitely seems unit testable. As a maintainer I much prefer testing smaller interfaces to exercise contracts between them than to see an integration test. Integration tests are nice for a sort of smoke test but they either don't cover enough use cases, become incredibly complicated, or take forever to run. They're definitely valuable but I think unit tests are healthier for maintainability.

crate_universe/src/utils/starlark/glob.rs Show resolved Hide resolved
@UebelAndre UebelAndre mentioned this pull request Apr 13, 2023
lock_pkg.name,
lock_pkg.version
),
None => return Ok(SourceAnnotation::Path),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no stronger guarantee to be made here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let local_patch = package.id.repr.contains("(path+file://");

comes to mind from get_attributes

root.to_string()
};
println!("There's a patch crate at '//{package}'.");
println!("Make sure that the following is the contents of '//{package}/BUILD.bazel':");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying that you'd expect the user to write their own complete BUILD files for each patched crate rather than autogenerating BUILD files based on the Cargo.toml from the patch

This is what I'm saying. I would much prefer to have buildable rust targets in the vendored directories then to have filegroups consumed and built by something external. The pattern feels a bit unexpected. Instead of having users define filegroups, would it be possible to extend the Dependencies API to make defining Rust targets in the vendored BUILD files easier?

@@ -155,6 +155,7 @@ pub enum SourceAnnotation {
#[serde(default, skip_serializing_if = "Option::is_none")]
patches: Option<BTreeSet<String>>,
},
Path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use this to store relevant info, like the package name or something that's calculated in get_attributes?

David Koloski and others added 4 commits September 23, 2024 15:36
It looks like we have the info we need to tease out the
patched crates' paths without having to resort to completely parsing
the ID field (which is technically opaque). We can do a half measure
and look for `(path+file://` and then use other metadata in addition
to knowledge of the tempdir path length to form the correct crate
roots, and then make the globs/source_roots point to those which will
hopefully be enough to get them to build.

So far this change just patches up the source roots, and they're not
even correctly normalized as relative paths yet so it definitely won't
build yet. we may need to actually generate the build files in the
correct locations with an alias or something to make defs.bzl in the
vendor dir still point to it.
Co-authored-by: Dan Johnson <[email protected]>
@P1n3appl3
Copy link
Contributor Author

@UebelAndre sorry to revive this old PR but Fuchsia is trying again to start using bazel for rust and not being able to patch deps is still blocking us. I've rebased the changes, and the vendor_local_patching example README is a good representation of how we want to use this feature if you need a refresher.

Last time this comment was the blocking concern. On Fuchsia we use Cargo.tomls in our patched crates as the source of truth for how to build them. It's understandable that you'd want fully buildable bazel targets in those patched crates rather than just some exposed filegroups for the sources, but would you be ok with cargo_bazel still spitting out those generated BUILD files so we could copy them into the patch dirs instead of having to hand maintain BUILD files alongside cargo manifests?

I think we agreed last time around that it'd be unexpected/not advisable to have cargo_bazel stick files in directories outside your vendor dir, so I'd be happy to make this output them there with a note at the top that they need to be moved to your patch if you want to use it (instead of handwriting a BUILD file yourself), and then I'll use aliases like you suggested to "re-export" the patched crate from the crates_vendor BUILD file, which should make the other 3p crates that depend on patched ones work.

Let me know if there have been any big changes that impact this approach, or if you'd rather me just close this and make a fresh PR.

@P1n3appl3
Copy link
Contributor Author

We've opted to clean this up and re-open it as a separate PR: #3071. Many of the original comments were stale after lots of rebasing, and the main point of discussion has been linked to and summarized from that new PR.

@P1n3appl3 P1n3appl3 closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crates_vendor: [patch.crates-io] fails to patch primary/transitive dependencies for path = "..."
3 participants