Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support third party crates with bzlmod #1910

Merged
merged 6 commits into from
Jan 3, 2024
Merged

Conversation

matts1
Copy link
Contributor

@matts1 matts1 commented Mar 30, 2023

This PR adds bzlmod support for third party crates via rules_rust.

It only adds some subset of crate_universe's features. Specifically, it supports:

  • Using a Cargo.toml and Cargo.lock file as the source of truth for dependencies.
  • Automatically generating build targets for your referenced dependencies in Cargo.toml.
  • One repo per Cargo.toml / Cargo.lock, to namespace and prevent implicit dependencies.

Out of scope of this PR is:

  • Vendoring deps
  • all_crate_deps or similar

@aaronmondal
Copy link

Interesting! I attempted turning crates_repository into an extension but didn't get too far yet. Being able to declare a "global" crates_index would probably be quite useful. It might also be worth looking at the @gazelle//:extensions.bzls go_deps implementation as that already works with bzlmod and is really nice to work with.

cc @fmeum who might be interested in this since crates_repository functionality similar to go_deps is probably the end goal here.

@fmeum
Copy link

fmeum commented Jun 15, 2023

Thanks for the mention. I will have to read up on cargo a bit to understand in which aspects it is similar to Go.

There is an ongoing effort to simplify isolation of extension usages which you may be able to take advantage of: bazelbuild/bazel#18529

Copy link

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want this kind of feature to get into rules_rust, but I'm not convinced this is the way to do it. The proposed API that end users would use in their MODULE.bazel files seems unintuitive to me.

IMO this PR is too large. I'd like to suggest splitting everything that isn't directly related to bzlmod off into separate smaller PRs that are easier to review. There is too much room for bugs if this many things are changed at once.

MODULE.bazel Outdated Show resolved Hide resolved
MODULE.bazel Outdated Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead file.

platform: ubuntu2004
working_directory: examples/bzlmod/hello_world
build_targets:
- "//..."
ubuntu2004_examples_bzlmod_crate_universe:
name: Bzlmod crate universe examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Bzlmod crate universe examples
name: Bzlmod Crate Universe examples

Comment on lines 24 to 32
def add_annotation(k, v):
if k not in annotations:
annotations[k] = []
annotations[k].append(v)

for file in annotation_files:
for name, annotations_for_crate in json.decode(module_ctx.read(file)).items():
for annotation_for_crate in annotations_for_crate:
add_annotation(name, _crate_universe_crate.annotation(**annotation_for_crate))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really wrong. Starlark doesn't have the same linters available as Python, but it's still probably a good practice to attempt to reuse as much from those as possible. In this case:

  • Don't use inline defs. Factor those out into standalone private functions.
  • Avoid deep nesting. It increases cognitive complexity by a lot.
  • Almost always prefer list/dict comprehensions when working with data objects. Regular for loops should almost exclusively be used for control flow where it's unavoidable.
  • Keep in mind that dicts have many methods to simplify operation on them.

rust/private/module_extensions/toolchain.bzl Outdated Show resolved Hide resolved
implementation = _crate_impl,
tag_classes = dict(
from_cargo = _from_cargo,
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc field unspecified.

ctx, # @unused
name,
data):
path = tag_path.get_child(name) # buildifier: disable=uninitialized

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning shouldn't be disabled. Instead the function should be moved out and path or tag_path should be an argument into it. That requires changes to write_config_file though which should be handled in separate reviews.

There must be a better solution to this. It's very confusing that this write_data_file has a similar intention but a different behavior than the one in crates_vendor.bzl.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to write_crates_config_file should be factored out into a separate commit and have separate review.

version = crate["version"]

# "+" isn't valid in a repo name.
crate_repo_name = "%s__%s-%s" % (repo_name, name, version.replace("+", "-"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"{}-{}".format(a, b) or "{a}-{b}".format(a, b). Formatting with % reduces accessibility. It's not always possible (e.g. in some ctx.args methods) but in this case it is.

@fmeum
Copy link

fmeum commented Jun 16, 2023

It also makes sense to follow https://bazel.build/external/extension#put_each_extension_in_a_separate_file and have a dedicated file per extension.

@matts1
Copy link
Contributor Author

matts1 commented Jun 19, 2023

@aaronmondal PTAL at #2021

I've followed your recommendations re naming, but I'm specifically not applying your suggestions about the toolchain, since although I agree with you, it's out of scope of this PR

@matts1 matts1 marked this pull request as draft June 19, 2023 04:17
@opicaud
Copy link

opicaud commented Jul 22, 2023

Hey all,

Thanks for this @matts1 , I tried your PR and I have few things to share :

  • There is an option called package in crates_repository rules, i found only annotations, am I wrong if I say that this option is missing in the extension ?
  • Using the new crate extensions involves also a breaking change : the repo name "<module_name>_crates" breaks the existing usage of the "old" repo in BUILD file. Also the all_crate_deps is no longer available, the new version is proposing to list all dependencies as mentioned in the example you have provided : this new list involves duplication of dependencies from Cargo.toml in BUILD file, duplication not present with the usage of all_crate_deps

I perfectly understand that is a challenge to make it works with minimum impacts : for brand new project, it's very ok to use it like it is, but for existing projects, i prefer limiting the change for now, even if the implementation is not "optimized", with just your PR #2021 and without using the new crate extension i managed to get it works without breaking change even if my solution is far to be elegant..

Olivier

@matts1
Copy link
Contributor Author

matts1 commented Jul 24, 2023

Re your first comment, that's correct, I hadn't implemented package.

I'm not entirely sure I understand what you're saying about breaking the "old" repo in BUILD file, but I suspect that this is what I alluded to earlier with use_repo(crate, crates = "<module_name>_crates") to rename the repository from <module_name>_crates to crates (but only within this module).

Re the all_crate_deps, you're correct that I don't implement it, but there's nothing stopping it from being implemented in a future CL - it'd be relatively trivial.

This PR is pretty big already, and I don't want to make it bigger by adding anything that isn't strictly required.

@opicaud
Copy link

opicaud commented Jul 27, 2023

Hey @matts1

Great :), so for now i can't migrate and I keep what I have, please let me know when packages and all_crate_deps will be supported, so that i will be able to use them and override the use_repo as you mentioned.

Thanks a lot!

Olivier

opicaud added a commit to opicaud/rules_rust that referenced this pull request Aug 24, 2023
@opicaud
Copy link

opicaud commented Aug 24, 2023

Hey @matts1

I've pushed changes from alchemy_v0.20.1 to fetch back the deps.bzl from the generated files in order to use all_crates_deps sugar syntax in BUILD.bazel in the bzlmod crate_universe example

It's working by adding crates in a use_repo, this is this part of code that I want to get rid of now.

I unfortunately needed to declare those repos because deps.bzl is generated with:
["@crates_repository_crates__env_logger-0.9.3//:env_logger", "@crates_repository_crates__log-0.4.17//:log"],
but we need something that looks more like
["@crates_repository_crates//:env_logger", "@crates_repository_crates//:log"].

I try to patch the render_config templates to get the right generation in deps.bzl but without success, as BUILD.bazel is also using this template.
Perhaps you have an idea to guide me? Here the changes
Thanks!

@fmeum
Copy link

fmeum commented Aug 24, 2023

If you can make it @@crates_repository_crates__env_logger-0.9.3//:env_logger (note the double at), that should also work.

@opicaud
Copy link

opicaud commented Aug 26, 2023

@fmeum

Unfortunatley, using the canonical repo as proposed don't make the difference.
Also the documentation about module is explaining that relying on the canonical name is not recommended, because subject to change.
What I notice is that:

  • When crates_repository is used in a non bzlmod context, repo name declared in the BUILD.bazel and BUILD__* and deps.bzl are the same, so no issue.
  • When crates_repository is used in a bzlmod context, only the repo declared in the use_repo is publicly available, which means deps.bzl should contains only aliases target, as other generated repo are private to this public one, so not available at root level (eg: all_crate_deps in BUILD file).

IMHO:

  • In bzlmod context, generation of deps.bzl is different than from a non bzlmod context
  • For now, I suspect deps.bzl to be the only file being different from a bzlmod/non-bzlmod context

@opicaud
Copy link

opicaud commented Sep 5, 2023

Hey,
I have figured it out for all_crates_deps by extending from_cargo with an repo_alias. Its role is to provide a prefix to add into the generated deps.bzl Now the example provided is working, see changes
Still examples like all_crates_deps( build = True) are yet not supported, when rendering BUILD file, because only dependencies are generated, and thus the alias does not exist

scentini added a commit that referenced this pull request Oct 25, 2023
I've pulled out some of the code from
#1910 into a seperate PR,
to simplify the review process.

---------

Co-authored-by: scentini <rosica@google.com>
matts1 added a commit to matts1/rules_rust that referenced this pull request Oct 30, 2023
This commit creates the "generate" prefix for each of these functions,
which generates the content of these files in a way that doesn't touch
the ctx variable. This allows us to use these functions in module
extensions, which is required for bazelbuild#1910.
matts1 added a commit to matts1/rules_rust that referenced this pull request Oct 30, 2023
This commit creates the "generate" prefix for each of these functions,
which generates the content of these files in a way that doesn't touch
the ctx variable. This allows us to use these functions in module
extensions, which is required for bazelbuild#1910.
matts1 added a commit to matts1/rules_rust that referenced this pull request Oct 30, 2023
This commit creates the "generate" prefix for each of these functions,
which generates the content of these files in a way that doesn't touch
the ctx variable. This allows us to use these functions in module
extensions, which is required for bazelbuild#1910.
scentini pushed a commit that referenced this pull request Oct 31, 2023
This commit creates the "generate" prefix for each of these functions,
which generates the content of these files in a way that doesn't touch
the ctx variable. This allows us to use these functions in module
extensions, which is required for
#1910.

---------

Co-authored-by: UebelAndre <github@uebelandre.com>
@matts1 matts1 marked this pull request as ready for review November 2, 2023 01:13
illicitonion pushed a commit that referenced this pull request Jan 2, 2024
This function isn't yet used. It will be used in #1910. I'm splitting it
into its own PR because no-one seems to want to review #1910 because
it's too big.
illicitonion added a commit to illicitonion/rules_rust that referenced this pull request Jan 2, 2024
bazelbuild#1910 has a workaround for
this because changing the timestamp may invalidate the repository rule.

Instead, just don't do the write if the contents is identical.
Copy link
Collaborator

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, thanks!

I've left comments for a handful of design discussions, let's work those out an rebase, and hopefully we can get this merged soon.

Thanks again for all the work you've been putting in here!

crate.from_cargo(
# This is optional, but if you need crate.annotation, then this is the
# bzlmod equivalent.
annotation_files = ["//:annotations.json"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some interesting trade-offs here... Is it possible to make this be an inline function call rather than an external file? (I'm not sure which we'd want to support if so, but I want to understand whether we're making API choices here or being forced by bzlmod's limitations).

The external file is nice in large repos given the lack of load support in a MODULE.bazel because it allows folks to spread this definition out of the MODULE.bazel file and allow e.g. separate code ownership of annotations from the root MODULE.bazel file.

In smaller repos, having this stuff in-line (and not needing to write a whole JSON file) is kind of nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove annotations from this PR, since it's not actually required.

examples/bzlmod/hello_world/MODULE.bazel Show resolved Hide resolved
return annotations

def _generate_hub_and_spokes(module_ctx, cargo_bazel, mod_path, cfg):
annotations = _generate_annotations(module_ctx, cfg.annotation_files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes more sense to do this processing of annotations in starlark here, or to just pass the JSON files to cargo-bazel and let it deal with them?

It feels like the starlark approach basically gives us stardoc at the expense of needing to duplicate the definitions across two languages and keep them in sync? And starlark gives us potentially earlier errors, but also potentially less control over the exact contents of the errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing annotations from this PR, will do them in a follow-up CL to minimize size of this PR

cargo_lockfile = module_ctx.path(cfg.cargo_lockfile)

rendering_config = json.decode(render_config(
regen_command = "Run 'cargo update [--workspace]'",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story with this regen_command? I don't thinkcargo update is generally something we should be recommending people run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified everything and removed all unnecessary complexity. In the old mechanism, you would have a Cargo.toml, a Cargo.lock, and a bazel index.

I've made it so that the Cargo.lock is the source of truth. The bazel index is based on the Cargo.lock, and is automatically kept up to date. There's no need to repin, because repinning can be done via "cargo update".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it so that the Cargo.lock is the source of truth. The bazel index is based on the Cargo.lock, and is automatically kept up to date.

I'm definitely excited by the idea of being able to get rid of the cargo-bazel lockfile, but I'm pretty sure there were reasons we hadn't already done this... @UebelAndre do you remember?

Some possible things that come to mind, though I'm not 100% confident in any of them:

  • The crate.spec (via extra crates a versions) and crate.annotation (via features) fields can change the resolution, which means if you're not going via our logic you'll miss those parts of the process when you cargo update.
  • Splicing is expensive (it involves doing a whole resolve, even if it's offline, as well as potentially running cargo tree many times to resolve features for platforms) - the cargo-bazel lockfile includes the results of splicing (which is why we can skip it entirely if not repinning), whereas Cargo.lock doesn't so will necessitate re-doing this work. Possibly Module.bazel.lock mitigates this as long as it's checked in? But if so, can we get rid of the cargo-bazel lock completely in bzlmod mode?

There's no need to repin, because repinning can be done via "cargo update".

Even if it's literally just proxying to a call to cargo update via the fetched local toolchain, it's important that we support repinning via bazel - we have users who don't themselves have cargo installed at all but who may need to trigger a repin.

crate_universe/extension.bzl Outdated Show resolved Hide resolved
crate_universe/extension.bzl Outdated Show resolved Hide resolved
tag_path,
"--metadata",
splicing_output_dir.get_child("metadata.json"),
"--repin",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it unconditionally repins? This is quite different from how the non-bzlmod logic works (

# Determine whether or not to repin depednencies
repin = determine_repin(
repository_ctx = repository_ctx,
generator = generator,
lockfile_path = lockfiles.bazel,
config = config_path,
splicing_manifest = splicing_manifest,
cargo = cargo_path,
rustc = rustc_path,
)
# If re-pinning is enabled, gather additional inputs for the generator
kwargs = dict()
if repin:
# Generate a top level Cargo workspace and manifest for use in generation
metadata_path = splice_workspace_manifest(
repository_ctx = repository_ctx,
generator = generator,
cargo_lockfile = lockfiles.cargo,
splicing_manifest = splicing_manifest,
config_path = config_path,
cargo = cargo_path,
rustc = rustc_path,
)
kwargs.update({
"metadata": metadata_path,
})
# Run the generator
execute_generator(
repository_ctx = repository_ctx,
generator = generator,
config = config_path,
splicing_manifest = splicing_manifest,
lockfile_path = lockfiles.bazel,
cargo_lockfile_path = lockfiles.cargo,
repository_dir = repository_ctx.path("."),
cargo = cargo_path,
rustc = rustc_path,
# sysroot = tools.sysroot,
**kwargs
)
)

What's the story with the difference here? Is this an intermediate step to making pinning conditional, or is there some reason bzlmod needs to do pinning differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest way to explain this is the different workflow.

If I wanted to update anyhow from 1.0.77 to 1.0.78, the workflow here is:

  1. Update the entry in Cargo.toml from 1.0.77 to 1.0.78
  2. Run cargo update --workspace to update Cargo.lock

Since everything is transient and automatically generated, you can then just build and it works. I'm not personally a big fan of the whole CARGO_BAZEL_REPIN method, but if people do care about it, I think it's fine to add it in a follow-up PR, but it definitely isn't needed for the MVP.

Note that if you try and build between step 1 and 2 you'll get an error saying that we need to update the lockfile, but that --locked was passed, so it'll refuse to compile.

crate_universe/extension.bzl Outdated Show resolved Hide resolved
illicitonion added a commit that referenced this pull request Jan 2, 2024
#1910 has a workaround for
this because changing the timestamp may invalidate the repository rule.

Instead, just don't do the write if the contents is identical.
Change-Id: I5dee8352f9a014b51cd36937475a5cc4765c4f66
Also separate out the sh_test into two independent tests rather than
combining them.
Copy link
Collaborator

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this, it's a useful stand-alone step on our journey - we can discuss what the next steps are :) (and shout if you want to try to arrange a call at any point, I'm sure we can make time zones work if needed :))

@illicitonion illicitonion merged commit 4f9b178 into bazelbuild:main Jan 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants