Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a repo mapping manifest in the runfiles directory #16321

Closed
wants to merge 1 commit into from

Conversation

Wyverald
Copy link
Member

To ensure we can use repo mappings in the runfiles library, this change writes an extra file "my_binary_target.runfiles/_repo_mapping", which contains a bunch of (base_repo_canonical_name, apparent_repo_name, canonical_repo_name) triples. See https://github.com/bazelbuild/proposals/blob/main/designs/2022-07-21-locating-runfiles-with-bzlmod.md for more information.

Work towards #16124

@Wyverald
Copy link
Member Author

@fmeum @lberki

@ShreeM01 ShreeM01 added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-user-response Awaiting a response from the author labels Sep 21, 2022
* more).
*/
@Before
public void setupSimpleBinaryRule() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding test cases for:

  1. dependencies caused by aspects
  2. host/exec dependencies
  3. dependencies caused by an late-bound attribute

Based on how the machinery that collects the set of transitive packages works, it looks like all these should work, but it's better to test it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get to this next. re "host/exec dependencies" though -- why would we want those in runfiles at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files from host/exec dependencies should never be in the runfiles tree, but to the best of my knowledge, this is not enforced anywhere, which means that it's pretty certain that someone does it anyway.

Therefore, packages of host/exec dependencies must also be in the runfiles repo mapping (unless you convince me that the above property is enforced, but I'd be very surprised)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see. I just added a new filtering so that only repos contributing runfiles would appear in the runfiles manifest (#16321 (comment)), so I think we should have this case covered (unless someone is actively including stuff from host/exec deps into their runfiles, in which case they can simply... not).

@@ -132,15 +139,18 @@ private static RunfilesSupport create(
runfilesInputManifest = null;
runfilesManifest = null;
}
Artifact repoMappingManifest = createRepoMappingManifestAction(ruleContext, owningExecutable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the same correctness guarantees will apply to the repo manifest as for the runfiles output manifest (and the runfiles tree), which is "nothing"? (I'm not saying this is a bad thing, I don't think you should be solving the correctness problems of runfiles here, just trying to validate my mental model against reality)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what you mean by "correctness guarantees"; any examples?

Copy link
Contributor

@lberki lberki Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, sorry for being terse. My understanding is that Bazel doesn't guarantee that if one does

bazel build //a:b
rm bazel-bin/a/b.runfiles/<some file>
bazel build //a:b

the runfiles tree is rebuilt except if <some file> is MANIFEST. Do I understand correctly that this would be the same with the repo mapping manifest, i.e. deleting it or changing it would not cause the symlink tree creation action to be re-run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know runfiles well enough to answer this question, but my guess is that yeah, the repo mapping manifest is just like the output manifest.

Artifact runfilesMiddleman =
createRunfilesMiddleman(ruleContext, owningExecutable, runfiles, runfilesManifest);
createRunfilesMiddleman(
ruleContext, owningExecutable, runfiles, runfilesManifest, repoMappingManifest);

boolean runfilesEnabled = ruleContext.getConfiguration().runfilesEnabled();

return new RunfilesSupport(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will repo mapping files work with sandboxes? I suppose they should appear there, but how would that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know how to answer this question :( Does the existing runfiles manifest file work with sandboxes? If so, it doesn't look like the repo mapping manifest file should be any different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a test case would dispel all my fears :)

The reason why I'm asking is that because I don't know off the top of my head how the runfiles manifest is plumbed to sandboxes and remote execution workers and since they are very special, I can't say with any confidence that the repo mapping file will be there. Ideally, by the time this changes is merged, we both would understand how runfiles output manifests appear in sandboxes / on RBE and be convinced that the repo mapping file does, too (and there would be a test also, of course!)

@sgowroji sgowroji added awaiting-review PR is awaiting review from an assigned reviewer and removed awaiting-user-response Awaiting a response from the author labels Sep 28, 2022
@@ -132,15 +135,19 @@ private static RunfilesSupport create(
runfilesInputManifest = null;
runfilesManifest = null;
}
Artifact repoMappingManifest =
createRepoMappingManifestAction(ruleContext, runfiles, owningExecutable);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that we aren't adding the repo mapping manifest to the runfiles manifest, which makes it more difficult to find it at runtime on Windows. If we move this call further up, we could pass the Artifact into SourceManifestAction.

@Wyverald
Copy link
Member Author

So that took a while.

  • In order to make all tests pass, I eventually moved the repo mapping manifest file up a level, so it's now a sibling (instead of a child) of the runfiles tree. This is to avoid any entanglement with SymlinkTreeAction, which nukes the runfiles tree before generating any symlinks.
    • I also tried moving the repo mapping manifest writing logic into SymlinkTreeAction, but that breaks the interaction with the flag --nobuild_runfile_links, which delays the symlink tree generation to bazel run time and doesn't actually execute SymlinkTreeAction.
  • I also went back to the original approach of flattening the NestedSet<Package> in RunfilesSupport instead of in RepoMappingManifestAction because the latter approach breaks some Google-internal tests.
  • Additionally, the repo mapping manifest file is not created unless we're sure Bzlmod is enabled, which is also required make Google happy (as it creates much fewer actions across the repo).
    • This could alternatively be solved by somehow inserting the repo mapping manifest writing logic into SourceManifestAction (the one that creates the input manifest), but that action is an AbstractFileWriteAction which only produces one output file, and it's rather tiresome to create a custom action class just so that it might write 2 files.

With luck, all the tests (both internal and external) should pass now.

private static ImmutableList<Entry> collectRepoMappings(
NestedSet<Package> transitivePackages, Runfiles runfiles) {
ImmutableSet<RepositoryName> reposContributingRunfiles =
runfiles.getAllArtifacts().toList().stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being a path not taken in Blaze and not taken in Bazel by default makes me a bit worried it may regress analysis phase performance outside Google. Were you able to get benchmark data on that or have other reasons to believe it's safe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get benchmark data. This change was made out of necessity rather than convenience, since things break internally if a Package is used as to compute an action key in any way.

Also, this is actually still run in Blaze, so its effects will be visible. It's just not run in certain other modes of builds in Google (where the transitive package tracking doesn't happen).


# Finally we get to build stuff!
self.RunBazel(['build', '//:me', '@bar//:bar'], allow_failure=False)
with open(self.Path('bazel-bin/me.repo_mapping'), 'r') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you build this check into a binary executed as part of an action? That would give use confidence that the file exists in the sandbox, not just in the output tree.

That said, we may want a test that covers remote execution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the bare_binary fail if it can't find the repo mapping manifest file, and changed bazel build to bazel run. The content assertion is still outside.

With the current way the repo mapping manifest file is generated, I don't think remote execution is a concern anymore -- it's just another normal artifact, and has no entanglement with the special handling around the symlink tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and of course the shell script doesn't work on windows... give me a second to fix that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the remote execution part. By making the binary a test and running bazel test, you would get coverage for the sandbox case without more code.

To ensure we can use repo mappings in the runfiles library, this change writes an extra file "my_binary_target.repo_mapping", which contains a bunch of (base_repo_canonical_name, apparent_repo_name, canonical_repo_name) triples. See https://github.com/bazelbuild/proposals/blob/main/designs/2022-07-21-locating-runfiles-with-bzlmod.md for more information.

The extra file is written using a new action "RepoMappingManifestAction", and it's only executed if we know for sure that Bzlmod is enabled. This avoid generating a lot of extra actions that are essentially useless for monorepo setups such as Google's.

Work towards #16124

PiperOrigin-RevId: 475820334
Change-Id: I885b4df093bd2c783c57d19f995f420b9b29b53c
@Wyverald
Copy link
Member Author

Merged as 527308c

@Wyverald Wyverald closed this Oct 25, 2022
@Wyverald Wyverald deleted the wyv-repomanifest branch October 25, 2022 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants