Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bzlmod + build_python_zip breaks runfiles resolution #17941

Closed
coquelicot opened this issue Mar 31, 2023 · 20 comments
Closed

bzlmod + build_python_zip breaks runfiles resolution #17941

coquelicot opened this issue Mar 31, 2023 · 20 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@coquelicot
Copy link

Description of the bug:

When building a py_binary with --build_python_zip and bzlmod enabled, the runfiles cannot be correctly resolved.
Things would work if I remove the --build_python_zip option or switch back to the old WORKSPACE style.
I suspect this has something to do with the way bzlmod handle repo mapping.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I have a very small example to reproduce the issue: https://github.com/coquelicot/broken-bzlmod-pyzip

Here's the log from my machine:

 » bazel run example
INFO: Build option --build_python_zip has changed, discarding analysis cache.
INFO: Analyzed target //:example (0 packages loaded, 539 targets configured).
INFO: Found 1 target...
Target //:example up-to-date:
  bazel-bin/example
INFO: Elapsed time: 0.280s, Critical Path: 0.01s
INFO: 4 processes: 4 internal.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/example
sample
 » bazel run --build_python_zip example
INFO: Build option --build_python_zip has changed, discarding analysis cache.
INFO: Analyzed target //:example (0 packages loaded, 539 targets configured).
INFO: Found 1 target...
Target //:example up-to-date:
  bazel-bin/example
  bazel-bin/example.zip
INFO: Elapsed time: 0.274s, Critical Path: 0.01s
INFO: 4 processes: 3 internal, 1 linux-sandbox.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/example
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_tuww8rqk/runfiles/_main/example.py", line 11, in <module>
    main()
  File "/tmp/Bazel.runfiles_tuww8rqk/runfiles/_main/example.py", line 6, in main
    with open(path, "r") as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/Bazel.runfiles_tuww8rqk/runfiles/pyzip/data.txt'

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

release 6.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

The way runfiles are implemented seems to be different with bzlmod: #16124

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator

fmeum commented Mar 31, 2023

Looks like the Python rules do not add symlinks and root_symlinks to the ZIP file, which causes the _repo_mapping file to be missing. I will try to fix this.

@sgowroji sgowroji added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. area-Bzlmod Bzlmod-specific PRs, issues, and feature requests labels Mar 31, 2023
@fmeum
Copy link
Collaborator

fmeum commented Mar 31, 2023

#17942 is the easy part, but this is unfortunately complicated on a deeper level:

The _repo_mapping file is currently injected directly in https://cs.opensource.google/bazel/bazel/+/6f12510e39153e316f35ab0a1ec376ece7f1b0a9:src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java;l=421, which makes it invisible to the runfiles Starlark API.

We could instead add it in a way similar to https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java;l=125, but that would construct a new Runfiles object for every target.

An alternative could be to add the artifact to every Runfiles object directly upon construction (as long as Bzlmod is enabled) and then only register the generating action in RunfilesSupport. Edit: This doesn't work as those Runfiles objects wouldn't be mergeable anymore.

@Wyverald @lberki I could use your help here as I don't have a good grasp on the potential performance implications of these approaches.

@fmeum
Copy link
Collaborator

fmeum commented Mar 31, 2023

Looked into this some more. Even if RunfilesSupport adds the repo mapping artifact to root symlinks, the Python rules still wouldn't see it as it is added only after the rule implementation has run. Maybe we need to expose it on RuleContext?

Edit: Cleared up some confusion I had, which led me to this insight: Computing the repo mapping requires the runfiles, so it simply isn't known before the rule commits to its runfiles. Getting it in the context of the rule only seems to be necessary if the rule wants to build a runfiles structure manually for itself - if it did this for some other rule, that rule's repo mapping would be in its runfiles. So unless the Python rules can build the ZIP as part of a separate rule using a macro, maybe RepoMappingManifestAction would need to be exposed to Starlark so that users can build their own manifests for arbitrary runfiles objects?

@lberki
Copy link
Contributor

lberki commented Apr 4, 2023

Let me see if I understand this correctly: the current way Python rules build ZIP files is that they dissect a Runfiles object and put the path to artifact mapping on a command line. But repo mappings make it impossible to determine that mapping solely from the Runfiles object. Is that correct?

Shooting from the hip, with the current toolbox, I could imagine creating a "fake" Python binary which is then put on the inputs of an action and then that action does not execute it, but simply extracts its runfiles tree. Would that work?

@fmeum
Copy link
Collaborator

fmeum commented Apr 4, 2023

@lberki That's exactly correct. If forcing such rules to become macros that obtain the runfiles from an actual target is acceptable, then that is certainly the easiest solution.

@rickeylev Could the executable Python rules go through a separate target to generate the runfiles that are then packaged up into the executable ZIP file?

@lberki
Copy link
Contributor

lberki commented Apr 4, 2023

Is the limitation that makes a macro necessary that FilesToRunProvider can only be created once per rule and outside the control of Starlark code and that the only way one can currently add a runfiles tree to the inputs of an action is by using ctx.actions.run(tools=) ?

@fmeum
Copy link
Collaborator

fmeum commented Apr 4, 2023

Is the limitation that makes a macro necessary that FilesToRunProvider can only be created once per rule and outside the control of Starlark code and that the only way one can currently add a runfiles tree to the inputs of an action is by using ctx.actions.run(tools=) ?

That is part of the reason. But even if there were a way to turn Runfiles into a FilesToRunProvider from Starlark, the logic that adds the repository mapping manifest is part of RunfilesSupport and RunfilesSupplier and may thus not invoked correctly in that way. This may require additional changes.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Apr 4, 2023
@rickeylev
Copy link
Contributor

Could the executable Python rules go through a separate target to generate the runfiles that are then packaged up into the executable ZIP file?

Maybe? That last mile logic to generate the executable for the zip-file case is a bit hairy.

Does every target have the repo mapping? Or just certain ones? I'm thinking just have an extra target generated along side with the necessary info, e.g.

def py_binary(name, ...
  repo_mapping("_" + name + "_repo_map")
  py_binary_rule(..., _INTERNAL_ONLY_REPO_MAP = "_" + name + "_repo_map")

def repo_mapping_impl(ctx):
  return [DefaultInfo()] # repo mapping automatically added, right?

def py_binary_impl(ctx):
  repo_map = ctx.attr._INTERNAL_ONLY_REPO_MAP.runfiles

Couple other thoughts:

  • PyBuiltins.java is where various Java-hooks can go. If the necessary info can be easily gotten via Java, that's the place to put such hooks.
  • The input_manifests arg might be of help? There's a (now unused) helper in PyBuiltins that converts a runfiles object into a RunfilesSupplier.

@fmeum
Copy link
Collaborator

fmeum commented Apr 15, 2023

@rickeylev With Bzlmod enabled, every target has the repo mapping as a root symlinks that is injected in getRunfilesInputs: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/Runfiles.java;drc=0ce17480390cdced2df8d59249561613e81f446f;l=421
It may thus not be accessible by rule implementations directly.

The artifact itself is created in RunfilesSupport: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/RunfilesSupport.java;l=135;bpv=1;bpt=1

We may have to make it available from there.

@rickeylev
Copy link
Contributor

Exposing the RepoMappingManifestAction is part of the problem, but most of what it does is computed beforehand and are inputs into that action. Manually doing that logic does look feasible, though.

Looking at how RunfilesSupport builds the repo mapping, the two pieces we need are ruleContext.getTransitivePackagesForRunfileRepoMappingManifest() and something to get Package.getPackageIdentifier().getRepository() (or an equivalent).

Unfortunately, these both use the Package object, which I don't think has a Starlark equivalent. Can an object that doesn't implement the Starlark classes be passed from Java to Starlark? If so, we can just pass them onto another helper in PyBuiltins to operate on. If not, we'll have to convert Package to something else (probably a string or struct), or move more logic to Java (the map building is pretty simple, so this looks quite feasible).

Making a Starlark implemented rule's RunfilesSupport available to it during its execution is somewhere near impossible, in particular for the Python rules, so that isn't really an option. Problem 1 is RunfilesSupport is created after the rule function completes. If you try to manually create a RunfilesSupport object during execution (PyBuiltins.new_empty_runfiles_with_middleman does such), then it creates the same actions multiple times, which triggers duplicate action conflict detection logic later. Problem 2 is, for the Python rules, the duplicate detection ends up triggering the empty runfiles supplier code multiple times during the analysis phase, which requires flattening the runfiles, and the net effect is a huge CPU and memory regression. If there is some sort of solution involving Starlark+RunfilesSupport, I don't know it :).

Er, hm. Is that repo mapping code flattening the runfiles during analysis phase? That doesn't sound good, well, a problem for another time.

If the action did more of the heavy lifting (computing the map), it would make this easier overall and probably also solve the runfiles-being-flattened-during-analysis issue. We could probably reduce it to a simple helper in PyBuiltins that looked something like:

public void createRepoMappingManifest(StarlarkRuleContext starlarkCtx, Runfiles runfiles) {
  starlarkCtx.registerAction(new RepoMappingAction(
    starlarkCtx.getRuleContext().getTransitivePackagesForRunfileRepoMappingManifest(),
    runfiles
  ));
}

@fmeum
Copy link
Collaborator

fmeum commented Apr 15, 2023

As far as I remember the repo mapping manifest action isn't really doing much work since actions must be serializable, but Packages are not. That's also why the NestedSet is indeed flattened during analysis time.

We only need a tiny fraction of the Package information, so maybe that could be solved with some kind of custom serialization logic for RepoMappingManifestAction instead? I don't know much about this area though.

@rickeylev
Copy link
Contributor

I was playing around with this tonight and may have come up with a somewhat simple work around: declare the repo mapping file and add it to the zip, but rely on the subsequent RunfilesSupport code to actually create the file. I think this might also have a bonus: because the same file isn't created twice, duplicate detection doesn't have to run. We can make it a bit less hacky by factoring out a "get file name" function from Runfiles.createRepoMappingManifestAction. I'll poke this some more tomorrow to see how promising it is.

As a side note, this is the second time I've encountered Starlark code wanting to get at some of the things in RunfilesSupport (the other case was par files, which wanted RunfilesSupport for a similar reason), so maybe there is some missing functionality somewhere.

copybara-service bot pushed a commit that referenced this issue May 11, 2023
Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards #17941

Closes #18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
@rickeylev
Copy link
Contributor

rickeylev commented May 11, 2023

Alright, today's experiment was trying to use a separate target, which looks iffy and unappealing.

  • The repo mapping is only generated and added to the runfiles of executable/test targets. This is due to StarlarkRuleConfiguredTargetUtil#addSimpleProviders (see the RunfilesSupport.withExecutable() call, guarded by the "is executable or test" check
  • When it is added, its added to the middleman artifact. See RunfilesSupport.createRunfilesMiddleman, where the repoMappingManifest is added to the nested set fed into the middleman
  • The code in Runfiles.getRunfilesInputs adds repoMappingManifest to the runfiles manifest, but not the runfiles itself. (and I think this is the manifest that may not be present, depending on a flag?) (Ah, I seem to recall that the manifest and inputs work in tandem: the manifest tells what path to materialize a file under when the runfiles tree is created, and the inputs tell what inputs are actually available)

The net effect is, when you look at a target's runfiles, you don't see the repo mapping because it's in the middleman artifact, which I think is a secret dependency of the executable somehow.

So, I ran the target through ctx.resolve_tools(). This returns the middle man artifact (contained within a depset of all necessary files) and an Starlark-opaque RunfilesSupplier object. Well, we have the RunfilesSupplier now, so I fed that into a java hook to call RunfilesSuppler.getRepoMappingManifest(). This sort of works. It did give a repo mapping file.

However, the resulting repo mapping is only correct if the outer target and inner target have the same dependencies (including implicit deps). In my experiments, I used a stub executable with no dependencies as the "inner" target. It's repo map was just the current workspace. The actual repo map includes the various implicit dependencies of py_binary itself (rules_python, platforms, bazel_tools, some others).

So to make this work, the inner target needs to copy all the passed in user values and the implicit deps of py_binary, forward along the transitive runfiles (if it doesn't, the repo map won't be correct) (and this particular runfiles object is unused), produce an unused executable file (to satisfy some starlark validations), and then the outer rule needs to dig out the repo mapping somehow. Note that all the attributes have to continue to exist on the outer rule so that things like aspects can still traverse them. I think there might be a more direct route to get the repo mapping than going through the middleman artifact (I think a java hook to go Target->FilesToRunProvider#getRunfilesSupport is possible), but that doesn't help with the generation of its content.

So, technically it's possible, but it is pretty wasteful. I'm very wary of generating essentially a copy of py_binary() -- the extra target isn't free; we already know this has a real cost. Within Google, we were forced to do this to make PAR files work and saw the effects in the performance metrics, and only deemed it OK because we didn't really have a choice and plan to make it an opt-in behavior instead of always-on behavior.

Ah -- I see the RepoMappingManifestAction PR got merged! I'll experiment with that -- looks pretty straight forward.

@rickeylev rickeylev self-assigned this May 12, 2023
@fmeum
Copy link
Collaborator

fmeum commented May 16, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 16, 2023
fmeum added a commit to fmeum/bazel that referenced this issue May 16, 2023
Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards bazelbuild#17941

Closes bazelbuild#18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
fmeum added a commit to fmeum/bazel that referenced this issue May 16, 2023
Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards bazelbuild#17941

Closes bazelbuild#18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
@iancha1992
Copy link
Member

@bazel-io fork 6.3.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 16, 2023
fmeum added a commit to fmeum/bazel that referenced this issue May 19, 2023
Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards bazelbuild#17941

Closes bazelbuild#18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
iancha1992 pushed a commit that referenced this issue May 19, 2023
…18419)

Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards #17941

Closes #18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
@iancha1992
Copy link
Member

@fmeum @rickeylev should this be now cherry-picked and included in the release-6.3.0?

@rickeylev
Copy link
Contributor

rickeylev commented May 20, 2023

That would be great, yes.

Note that this fix (4e60992) requires ee46d1a

@iancha1992
Copy link
Member

That would be great, yes.

Note that this fix (4e60992) requires ee46d1a

@rickeylev @fmeum I did make sure the commit ee46d1a was merged. However, it seems like additional commits needs to be cherry-picked first in order for the commit to be cherrypicked in the release branch. Could you please submit a PR or let us know what commits to include? Currently, there are 3 files that are missing in the release-6.3.0 branch (src/main/java/com/google/devtools/build/lib/rules/python/PyBuiltins.java, src/main/starlark/builtins_bzl/common/python/py_executable_bazel.bzl, src/main/starlark/builtins_bzl/common/python/py_internal.bzl).

@rickeylev
Copy link
Contributor

Ahhh, I forgot most of the Starlark rewrite work didn't make the 6.0 cut. It won't be feasible to port this to 6.x. Even if we did, it'd require re-implementing the fix based on the (now deleted) Java implementation of the rules.

fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
Delays the flattening from action creation to action execution in preparation for making this action accessible from Starlark.

Work towards bazelbuild#17941

Closes bazelbuild#18349.

PiperOrigin-RevId: 531165675
Change-Id: Ia2c175834d409c30303dd3ecba0dd276ea2cd905
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
…ctly

To make runfiles lookups work correctly with bzlmod, a `_repo_mapping` file
is used to map apparent names to canonical names. This file is normally
automatically created by higher level code and injected into the runfiles, but
that happens after a rule has finished running. This means it isn't in the
runfiles object that py_binary itself sees when its constructing a zip file of
itself.

To fix, the zip file generates a repo mapping itself and puts it into the zip
file, thus allowing the runfiles library to find it when run from the zip file.
This requires a Java hook since the underlying `RepoMappingManifestAction`
isn't directly exposed to Starlark.

Fixes bazelbuild#17941

PiperOrigin-RevId: 532265478
Change-Id: I5f36fe58f043611481a7ed6ba19a7bbf5be4edf2
@iancha1992
Copy link
Member

Ahhh, I forgot most of the Starlark rewrite work didn't make the 6.0 cut. It won't be feasible to port this to 6.x. Even if we did, it'd require re-implementing the fix based on the (now deleted) Java implementation of the rules.

cc: @Wyverald @meteorcloudy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

No branches or pull requests

9 participants