From 27707aa1b6fc5ecf1c9f29c0c4e16a6a37ac3b5b Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Mon, 18 Jul 2022 17:43:19 +0200 Subject: [PATCH 01/10] Add "Locating runfiles with Bzlmod" --- README.md | 1 + ...022-07-21-locating-runfiles-with-bzlmod.md | 249 ++++++++++++++++++ 2 files changed, 250 insertions(+) create mode 100644 designs/2022-07-21-locating-runfiles-with-bzlmod.md diff --git a/README.md b/README.md index 5a85fcb..285e3cb 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,7 @@ list. | 2021-06-25 | [Better Starlark analysis unit tests](https://docs.google.com/document/d/12sC92mq7WasTvDWsm4U782oCegitfi1jbgOnMIGCe-A/edit?usp=sharing) | [@hvadehra](https://github.com/hvadehra) | Starlark | | 2022-04-25 | [Extend expand_template() API to reduce retained memory](https://docs.google.com/document/d/1Bjos2J0eLfxiy3_GU7LxOlciJAAqcbZCIIZ2VK0o5Wo/edit?usp=sharing) | [@hvadehra](https://github.com/hvadehra) | Starlark | | 2022-04-29 | [@since(version) annotations in documentation.](https://docs.google.com/document/d/17JBmzeY2NpCWbAGQjsqtmMRLOl0n8IcPYCMOYyFsG1k/edit#) | [@aiuto](https://github.com/aiuto) | Documentation | +| 2022-07-21 | [Locating runfiles with Bzlmod](designs/2022-07-21-locating-runfiles-with-bzlmod.md) | [@fmeum](https://github.com/fmeum) | External Repositories, Bazel | ### Approved diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md new file mode 100644 index 0000000..ea7040b --- /dev/null +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -0,0 +1,249 @@ + +--- +created: 2022-07-21 +last updated: 2022-07-21 +status: Draft +reviewers: + - @Wyverald + - @meteorcloudy + - @phst +title: Locating runfiles with Bzlmod +authors: + - @fmeum +--- + +# Abstract + +With Bazel's new external dependency management system [Bzlmod](https://bazel.build/docs/bzlmod), the actual names of repositories under the output base are no longer static constants, but depend on the result of its dependency resolution algorithm. +As a consequence, functionality that relies on the static nature of these identifiers, most notably runfiles libraries and Stardoc, need to be made aware of the mapping between the names used to refer to repositories and their actual canonical name on disk. +This proposal suggests a way to serialize this information into a new type of manifest as well as changes to rulesets and runfiles libraries that allows end users to locate runfiles under Bzlmod with no or minimal changes to their existing code. + +# Background + +Bazel has had a concept of [repository mappings](https://bazel.build/docs/external#shadowing-dependencies) for a long time. +A repository can specify a mapping applied to the repository part of label strings used within this repository. +For example, a repository can specify that every label `@foo//:target` specified in it is actually interpreted as `@bar//:target`. +Here, `foo` is called the *apparent repository name* and `bar` is called the *canonical repository name*. +Repository mappings are rarely used in the `WORKSPACE` system for external dependency management and, crucially, all mapping entries are static. + +Bazel's new external dependency management system [Bzlmod](https://bazel.build/docs/bzlmod) implements "strict deps" checks for repositories by using repository mappings. +Every Bazel module depending on a Bazel module `foo` can specify its own apparent repository name for the repository generated by `foo`. +The canonical name of the repository corresponding to `foo` is an implementation detail of Bzlmod and typically of the form `@foo~1.0.3`, where `1.0.3` is the version of `foo`. +This version is determined by Bzlmod's dependency resolution algorithm and thus not known in advance by any of the modules participating in this resolution. +In contrast to the situation for `WORKSPACE` files, the canonical repository names thus aren't known by the Bazel modules themselves. + +The fact that canonical repository names have to be treated as dynamic with Bzlmod poses problems for at least two parts of Bazel that so far rely on repository names being statically known: + +1. If an executable target `@foo//:exe` in a Bazel module `foo` has a `data` dependency on a file target `@bar//pkg:runfile`, where `bar` is a Bazel module that is a direct dependency of `foo`, then the runfiles path of `@bar//:runfile` will be `/pkg/runfile`. + In order to locate the file at runtime, `@foo//:exe` needs to pass this path to one of the [runfiles libraries](https://docs.google.com/document/d/e/2PACX-1vSDIrFnFvEYhKsCMdGdD40wZRBX3m3aZ5HhVj4CtHPmiXKDCxioTUbYsDydjKtFDAzER5eg7OjJWs3V/pub). + This is a problem since the path contains the version of `bar`, which isn't static, and thus can't be referenced as a constant literal string anymore. + In fact, unless `foo` is the main module, the same problem already arises for files in `foo` itself. + This issue is tracked by [bazelbuild/bazel#15029](https://github.com/bazelbuild/bazel/issues/15029). +2. [Stardoc](https://github.com/bazelbuild/stardoc) generates documentation from `.bzl` files and for that has to resolve labels in `load` statements to `.bzl` files on disk. + Since it doesn't actually evaluate `WORKSPACE` or `MODULE.bazel` files, it has no way of discovering repository mappings in the same way as Bazel. + Thus, with Bzlmod, Stardoc fails to generate documentation for any `.bzl` file that loads `.bzl` files from other repositories. + This issue is tracked by [bazelbuild/bazel#14140](https://github.com/bazelbuild/bazel/issues/14140) and [bazelbuild/stardoc#117](https://github.com/bazelbuild/stardoc/issues/117). + +# Proposed changes + +## 1. Add a new provider that marks runfiles libraries + +Both Change 2 and 4 described in this proposal require Bazel targets that provide runfiles lookup functionality to be marked with a distinct provider `RunfilesLibraryInfo` exposed to Starlark. +The provider carries no data, it is purely used as a marker. + +A target that depends on a target advertising this provider will be given access to a serialized form of its repository mapping at runtime as described in Part 2. + +Furthermore, depending on the nature of the programming language of the target, the language-specific rules can use this information to generate code that allows the target compile-time access to its canonical repository name as described in Change 4. + +### Backwards compatibility + +Adding a new global symbol is a backwards-compatible change. + +In order to support Bzlmod, rulesets providing runfiles libraries will have to be updated to advertise the new provider. +Since a new global symbol can't be feature-detected in Starlark, this can only be done once the ruleset providing the runfiles libraries targets a Bazel version that offers the new provider as its minimum supported version. +For runfiles libraries that do not ship with Bazel, it would thus be helpful to cherry-pick this change into Bazel 5. + +## 2. Emit a repository mapping manifest for each executable target + +At runtime, an executable target has to be able to translate an apparent repository name to a canonical repository name. +This requires serializing the repository mapping of every transitive dependency of the target that may perform runfiles lookups to a file that can be accessed at runtime, similar to the existing runfiles manifest. + +In more formal terms, the complete repository mapping maintained by Bazel is a function `repo_mapping` that maps a pair `(C, A)` of a canonical repository name `C` and an apparent repository name `A` to the canonical repository name `repo_mapping(C, A)` to which `A` resolves when referred to from within `C`. + +The repository mapping manifest would then look as follows: + +1. The repository mapping manifest is an ASCII file named `.repo_mapping` and placed in the same directory as the target's executable. +2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by a single space. + Since neither canonical nor apparent repository names can contain spaces, this is unambiguous. + As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name instead. + This is necessary since the main repository is stored under this name. +3. The lines of the manifest are sorted lexicographically. + This ordering is equivalent to the lexicographical ordering of the triples comprising the repository mapping. +4. The repository mapping manifest for an executable target `T` consists only of those entries `(C, A, repo_mapping(C, A))` for which both of the following conditions are satisfied: + - `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`; + - `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`. + This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. + If all rather than this limited set of entries were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. + +In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path `_repo_mapping`. +Since user-supplied repository names cannot start with `_`, there is no risk of collision with actual runfiles. + +### Example + +Consider the following Bazel module: + +```starlark +# WORKSPACE +workspace(name = "my_workspace") + +# MODULE.bazel +module( + name = "my_module", + version = "1.2.3", +) +bazel_dep(name = "protobuf", version = "3.19.2", repo_name = "my_protobuf") +bazel_dep(name = "rules_go", version = "0.33.0") + +# BUILD.bazel +cc_binary( + name = "some_tool", + srcs = ["main.cpp"], + data = [ + "data.txt", + "@my_protobuf//:protoc", + ], + deps = ["@bazel_tools//tools/cpp/runfiles"], +) +cc_binary( + name = "other_tool", + srcs = ["main.cpp"], + data = [ + "@my_protobuf//:protoc", + ], + deps = ["@bazel_tools//tools/cpp/runfiles"], +) +``` + +With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows (spaces are represented as `[ ]` for clarity): + +``` +[ ]my_module[ ]@my_module +[ ]my_workspace[ ]@my_module +[ ]my_protobuf[ ]@protobuf~3.19.2 +``` + +Note that the repository mapping manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. +It also doesn't contain any entries referencing `rules_go`, even though the main module depends on it. + +The repository mapping manifest of `//:other_tool` would look as follows: + +``` +[ ]my_protobuf[ ]@protobuf~3.19.2 +``` + +This manifest does *not* contain any entries where `repo_mapping(C, A)` is `my_workspace` since the target does not include any runfiles from the main repository. + +If a module `other_module` depends on `my_module` and contains a target that depends on `@my_module//:some_tool`, then that target's repository mapping manifest would contain the following lines (among others): + +``` +@my_module~1.2.3[ ]my_module[ ]@my_module~1.2.3 +@my_module~1.2.3[ ]my_protobuf[ ]@protobuf~3.19.2 +``` + +### Implementation details + +- Add a new internal `RunfilesLibraryUsersProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. +- In [`RuleConfiguredTargetBuilder#build()`], collect the `RunfilesLibraryUsersProvider` of all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. + This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage without requiring every rule implementation to cooperate. +- Pass the `RepositoryMapping`s to `RunfilesSupport` and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. + This requires maintaining the inverse of the mapping `RepositoryMapping` in a `Multimap`: repository mappings are not necessarily injective. +- Add the repository mapping manifest artifact to the runfiles middleman and the runfiles source manifest. + +### Backwards compatibility + +Emitting a new file into the runfiles tree is a backwards-compatible change. + +## 3. Extend the runfiles libraries to take repository mappings into account + +Based on Change 2, runfiles libraries can now apply the repository mapping at runtime. +Concretely, runfiles libraries should modify their current lookup procedure described in the [original design doc](https://docs.google.com/document/d/e/2PACX-1vSDIrFnFvEYhKsCMdGdD40wZRBX3m3aZ5HhVj4CtHPmiXKDCxioTUbYsDydjKtFDAzER5eg7OjJWs3V/pub) as follows: + +1. After the `Create` method of the runfiles library has located either the runfiles directory or the runfiles manifest, it uses either to look up the repository mapping manifest under the fixed runfiles path `_repo_mapping`. + + a. If the runfiles library is shipped with Bazel, it should fail with a descriptive error message if the manifest is not found. + + b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest. + +2. The runfiles library parses the repository mapping manifest and stores a map of `A` to `repo_mapping(C, A)` for the fixed canonical repository name `C` corresponding to the repository from which the `Create` method was called. + + The precise way in which `C` is determined depends on the nature of the language. + Suggestions are provided as part of Change 4 below and may involve additional steps performed by the end user, e.g., passing a certain variable as an argument to the `Create` method. + + If `C` can't be determined, the runfiles library sets `C` to the empty string, which corresponds to performing runfiles lookups from within the repository mapping context of the main repository. +4. When `Rlocation` is called with a runfiles-root-relative path `rpath`, after validating `rpath`, the runfiles library extracts the first path segment, understood to be an apparent repository name `A`, of `rpath` and looks it up in the map created in Step 2. + + a. If `A` is contained in the map, the runfiles library continues the original lookup procedure for the modified path `repo_mapping(C, A) + rpath[rpath.find('/'):]`. + + b. If `A` is not contained in the map, the runfiles library continues the original lookup procedure for the unmodified path `rpath`. + +The precise way in which this procedure is implemented depends on the particular implementation of the runfiles library for each language. +For example, the runfiles library for Bash doesn't offer a `Create` method and implicitly looks up the runfiles library or manifest on each invocation of the `rlocation` function. +The implementation details will have to be handled by the individual libraries' maintainers. + +Step 2 falls back to using the empty string as the canonical repository name of the current repository so that existing code performing runfiles lookup can be kept working to the extent possible: +While this will not lead to successful runfiles lookups from modules that are used as dependencies of other modules, it will correctly resolve in the typical case of ruleset end users that only look up files in their main repository or direct dependencies of that repository. +It is possible that this behavior leads to cases where a runfiles lookup succeeds, but resolves to a different file than intended, for runfiles libraries usages by dependencies that haven't been updated. +The author believes that the risk of this happening in practice is outweighed by the chance to remain compatible with existing end user codebases. + +Step 3 a) falls back to looking up the unmodified runfiles path `rpath` so that alternative approaches for generating post-repo-mapping runfiles paths remain viable. +This includes: + +- Passing in runfiles paths from Starlark via environment variables or arguments. +- Generating code constants containing already repo-mapped runfiles paths (see [rules_runfiles](https://github.com/fmeum/rules_runfiles)). + +This fallback behavior is not prone to ambiguity: With Bzlmod, canonical repository names always starts with an `@`, which is never contained in a valid apparent repository name. + +### Backwards compatibility + +End users that do not use Bzlmod will experience no changes when using updated runfiles libraries. + +End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern. + +# Suggested follow-up changes + +The following changes solve the two problems described in the background section of this document based on Change 1, 2, and 3 described above. +However, due to the federated nature of rulesets, which in many cases aren't even released with Bazel, and the wildly different natures of the languages they cover, these changes are at this point only to be treated as suggestions. +Further feedback from ruleset and Stardoc maintainers as well as language experts is required to ensure that these changes actually provide the best possible API and work correctly in all cases. + +## 4. Make the canonical repository name of the current repository available at runtime + +Using the new features described in Part 2 and 3, repository mapping aware runfiles lookups can be performed knowing the canonical repository name of the repository containing the code that performs the lookup. +The canonical repository name is not an absolute constant known to the end user --- it is an implementation detail of Bzlmod and e.g. contains module versions that may change depending on the outcome of the dependency resolution algorithm. + +For compiled languages, some amount of language-specific code generation will likely be needed to make that information available at runtime. +Ideally, this code generation would be performed by the language-specific rules themselves if and only if a direct dependency of a target advertises the `RunfilesLibraryInfo` provider. +For example, the following changes could be made to the rules for compiled languages shipped with Bazel: + +- **C/C++**: If a `cc_*` target has a direct dependency advertising `RunfilesLibraryInfo`, implicitly add a `local_define` of `BAZEL_CURRENT_REPOSITORY_NAME` with value the canonical repository name of the repository containing the target. Then, require users to pass `BAZEL_CURRENT_REPOSITORY_NAME` as an argument to the runfiles library's `Create` method. +- **Java**: If a `java_*` target has a direct dependency advertising `RunfilesLibraryInfo`, generate a source file providing a `static final String` constant `com.google.devtools.build.runfiles.RunfilesHelper.CURRENT_REPOSITORY_NAME` with value the canonical repository name of the repository containing the target. + Separately compile this source file with `neverlink = True` and implicitly add it as compile-time dependency of the target. + Since the Java Language Specification mandates that [`static` constants are inlined](https://docs.oracle.com/javase/specs/jls/se17/html/jls-14.html#d5e26341), this will ensure that no reference to the `RunfilesHelper` class is emitted into the compilation output of the target. + Each target can then reference the same `CURRENT_REPOSITORY_NAME` constant and pass it to the `Create` method of `Runfiles`. + +For other compiled as well as dynamic languages, the canonical repository name may instead be obtained directly by the runfiles library by parsing the path of the calling source file. For example, Go has [runtime.Caller](https://pkg.go.dev/runtime#Caller), Python has `inspect.getframeinfo(sys._getframe(1)).filename` and Bash has `caller`. + +## 5. Make Stardoc compatible with repository mappings + +Using the new functionality suggested by this proposal, Stardoc can be made aware of repository mappings as follows: + +1. Add an implicit dependency on a global helper target advertising `RunfilesLibraryInfo` to `bzl_library`. +2. Create a new `stardoc_runfiles` rule that consumes `bzl_library` targets and advertises a `DefaultInfo` with all transitive `.bzl` files in `runfiles` and a no-op executable in `executable`. +3. Replace the current `stardoc` rule with a macro that evaluates to both a `stardoc_runfiles` and a `stardoc` target, with the latter referencing the former and adding it to the `tools` of the actual Stardoc action. + This stages all `.bzl` files as runfiles and, due to 1., ensures that they are covered by the generated `.repo_mapping` manifest. +4. At runtime, parse the runfiles path of a root `.bzl` file to obtain its canonical repository name and use the Java runfiles library to look up the paths of dependencies according to this repository's repository mapping. + +Steps 2 and 3 could alternatively be replaced by implementing and using one of the new features proposed in [bazelbuild/bazel#15486](https://github.com/bazelbuild/bazel/issues/15486). + +[RunfilesHelper]: https://github.com/fmeum/bazel/commit/fdbc85448553ef46e620bbfc09cb3c11b7e05425#diff-75515109d2df88a368ed059a7ad73a424d0bd1db785d1939f41531266b7769b2 +[`InstrumentedFilesCollector#forwardAll`]: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/test/InstrumentedFilesCollector.java;l=65;drc=86b800e4066389d4535fbdf280cfe9447b06df15;bpv=1;bpt=1 +[`RuleConfiguredTargetBuilder#build()`]: https://cs.opensource.google/bazel/bazel/+/86b800e4066389d4535fbdf280cfe9447b06df15:src/main/java/com/google/devtools/build/lib/analysis/RuleConfiguredTargetBuilder.java;l=105;bpv=1;bpt=1 From d33bd4aa91c21a7b1d14b78f3115df236f1d87eb Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Fri, 22 Jul 2022 12:03:27 +0200 Subject: [PATCH 02/10] Address review comments --- ...022-07-21-locating-runfiles-with-bzlmod.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index ea7040b..4e07ee5 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -73,8 +73,8 @@ In more formal terms, the complete repository mapping maintained by Bazel is a f The repository mapping manifest would then look as follows: 1. The repository mapping manifest is an ASCII file named `.repo_mapping` and placed in the same directory as the target's executable. -2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by a single space. - Since neither canonical nor apparent repository names can contain spaces, this is unambiguous. +2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by `,`. + Since neither canonical nor apparent repository names can contain commas, this is unambiguous. As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name instead. This is necessary since the main repository is stored under this name. 3. The lines of the manifest are sorted lexicographically. @@ -83,10 +83,10 @@ The repository mapping manifest would then look as follows: - `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`; - `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`. This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. - If all rather than this limited set of entries were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. + If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path `_repo_mapping`. -Since user-supplied repository names cannot start with `_`, there is no risk of collision with actual runfiles. +Since canonical repository names do not start with `_`, there is no risk of this synthetic part being a prefix of an actual runfile. ### Example @@ -124,12 +124,12 @@ cc_binary( ) ``` -With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows (spaces are represented as `[ ]` for clarity): +With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: ``` -[ ]my_module[ ]@my_module -[ ]my_workspace[ ]@my_module -[ ]my_protobuf[ ]@protobuf~3.19.2 +,my_module,@my_module +,my_workspace,@my_module +,my_protobuf,@protobuf~3.19.2 ``` Note that the repository mapping manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. @@ -138,7 +138,7 @@ It also doesn't contain any entries referencing `rules_go`, even though the main The repository mapping manifest of `//:other_tool` would look as follows: ``` -[ ]my_protobuf[ ]@protobuf~3.19.2 +,my_protobuf,@protobuf~3.19.2 ``` This manifest does *not* contain any entries where `repo_mapping(C, A)` is `my_workspace` since the target does not include any runfiles from the main repository. @@ -146,8 +146,8 @@ This manifest does *not* contain any entries where `repo_mapping(C, A)` is `my_w If a module `other_module` depends on `my_module` and contains a target that depends on `@my_module//:some_tool`, then that target's repository mapping manifest would contain the following lines (among others): ``` -@my_module~1.2.3[ ]my_module[ ]@my_module~1.2.3 -@my_module~1.2.3[ ]my_protobuf[ ]@protobuf~3.19.2 +@my_module~1.2.3,my_module,@my_module~1.2.3 +@my_module~1.2.3,my_protobuf,@protobuf~3.19.2 ``` ### Implementation details From 87374d1dadae5b3c8877d364878430980385e8b5 Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Fri, 22 Jul 2022 16:23:36 +0200 Subject: [PATCH 03/10] Address comment on workspace name --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 4e07ee5..472a0b9 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -75,7 +75,7 @@ The repository mapping manifest would then look as follows: 1. The repository mapping manifest is an ASCII file named `.repo_mapping` and placed in the same directory as the target's executable. 2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by `,`. Since neither canonical nor apparent repository names can contain commas, this is unambiguous. - As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name instead. + As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name (either the value of the `name` attribute of the `workspace` function or `__main__`) instead. This is necessary since the main repository is stored under this name. 3. The lines of the manifest are sorted lexicographically. This ordering is equivalent to the lexicographical ordering of the triples comprising the repository mapping. @@ -127,8 +127,8 @@ cc_binary( With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: ``` -,my_module,@my_module -,my_workspace,@my_module +,my_module,my_workspace +,my_workspace,my_workspace ,my_protobuf,@protobuf~3.19.2 ``` From 24144cd490033afb08f0a1040e0ae0eaf8a73e5c Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Fri, 22 Jul 2022 17:03:47 +0200 Subject: [PATCH 04/10] Clarify that the main repo's canonical name does not start with @ --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 472a0b9..25e5172 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -201,7 +201,7 @@ This includes: - Passing in runfiles paths from Starlark via environment variables or arguments. - Generating code constants containing already repo-mapped runfiles paths (see [rules_runfiles](https://github.com/fmeum/rules_runfiles)). -This fallback behavior is not prone to ambiguity: With Bzlmod, canonical repository names always starts with an `@`, which is never contained in a valid apparent repository name. +This fallback behavior is not prone to ambiguity: With Bzlmod, canonical names of non-main repositories always starts with an `@`, which is never contained in a valid apparent repository name. ### Backwards compatibility From 5d0d963121dee31fdc40fe8a810a1c8a436450f7 Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Fri, 22 Jul 2022 17:04:10 +0200 Subject: [PATCH 05/10] Determine the repo context in Rlocation, not Create --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 25e5172..2fa5052 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -174,17 +174,13 @@ Concretely, runfiles libraries should modify their current lookup procedure desc b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest. -2. The runfiles library parses the repository mapping manifest and stores a map of `A` to `repo_mapping(C, A)` for the fixed canonical repository name `C` corresponding to the repository from which the `Create` method was called. +2. The runfiles library parses the repository mapping manifest and stores a map of `(C, A)` to `repo_mapping(C, A)` for all entries in the manifest. - The precise way in which `C` is determined depends on the nature of the language. - Suggestions are provided as part of Change 4 below and may involve additional steps performed by the end user, e.g., passing a certain variable as an argument to the `Create` method. +3. When `Rlocation` is called with a runfiles-root-relative path `rpath`, after validating `rpath`, the runfiles library extracts the first path segment, understood to be an apparent repository name `A`, of `rpath`. It also determines `C`, the canonical name of the repository from which it was called, in a language-specific way (see Change 4 below suggestions). If `C` can't be determined, the runfiles library sets `C` to the empty string, which corresponds to performing runfiles lookups using the mapping of the main repository. It then looks up the map entry for the key `(C, A)` created in Step 2. - If `C` can't be determined, the runfiles library sets `C` to the empty string, which corresponds to performing runfiles lookups from within the repository mapping context of the main repository. -4. When `Rlocation` is called with a runfiles-root-relative path `rpath`, after validating `rpath`, the runfiles library extracts the first path segment, understood to be an apparent repository name `A`, of `rpath` and looks it up in the map created in Step 2. + a. If `(C, A)` is contained in the map, the runfiles library continues the original lookup procedure for the modified path `repo_mapping(C, A) + rpath[rpath.find('/'):]`. - a. If `A` is contained in the map, the runfiles library continues the original lookup procedure for the modified path `repo_mapping(C, A) + rpath[rpath.find('/'):]`. - - b. If `A` is not contained in the map, the runfiles library continues the original lookup procedure for the unmodified path `rpath`. + b. If `(C, A)` is not contained in the map, the runfiles library continues the original lookup procedure for the unmodified path `rpath`. The precise way in which this procedure is implemented depends on the particular implementation of the runfiles library for each language. For example, the runfiles library for Bash doesn't offer a `Create` method and implicitly looks up the runfiles library or manifest on each invocation of the `rlocation` function. From 1924d3beb7afe7600acef133c110a96aa352179c Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Wed, 10 Aug 2022 15:15:35 +0200 Subject: [PATCH 06/10] Address @oquenchil's comments --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 2fa5052..fdbb52b 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -132,8 +132,10 @@ With `my_module` as the root module and Bzlmod enabled, the repository mapping m ,my_protobuf,@protobuf~3.19.2 ``` -Note that the repository mapping manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. -It also doesn't contain any entries referencing `rules_go`, even though the main module depends on it. +* The manifest contains the canonical repository name for the workspace and module name since `//:some_tool` has a data dependency on `//:data.txt` and a direct dependency on a runfiles library. +* The manifest contains the canonical repository name for protobuf since `//:some_tool` has a data dependency on `@my_protobuf//:protoc` and a direct dependency on a runfiles library. +* The manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. +* The manifest does *not* contain any entries referencing `rules_go`, even though the main module depends on it. `rules_go` neither contributes any runfiles to `//:some_tool` nor any target that depends on a runfiles library. The repository mapping manifest of `//:other_tool` would look as follows: @@ -152,7 +154,9 @@ If a module `other_module` depends on `my_module` and contains a target that dep ### Implementation details -- Add a new internal `RunfilesLibraryUsersProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. +- Add a new field to `RunfilesProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. + Alternatively, this information could be tracked by a new provider that is only added if needed. + The latter may perform better with respect to memory consumption as most targets in a build will not transitively depend on a runfiles library. - In [`RuleConfiguredTargetBuilder#build()`], collect the `RunfilesLibraryUsersProvider` of all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage without requiring every rule implementation to cooperate. - Pass the `RepositoryMapping`s to `RunfilesSupport` and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. From d72d749e1538f27c93d3c502077e6dbe1c23cf55 Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Wed, 10 Aug 2022 15:35:56 +0200 Subject: [PATCH 07/10] Clarify why RunfilesLibraryUsersProvider is needed --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index fdbb52b..1dcb4b9 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -154,9 +154,8 @@ If a module `other_module` depends on `my_module` and contains a target that dep ### Implementation details -- Add a new field to `RunfilesProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. - Alternatively, this information could be tracked by a new provider that is only added if needed. - The latter may perform better with respect to memory consumption as most targets in a build will not transitively depend on a runfiles library. +- Add a new internal `RunfilesLibraryUsersProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. + Adding a new provider is preferred over adding fields on e.g. the existing `RunfilesProvider` since the latter is controlled by rule implementations, but even non-cooperating rules need to forward and augment the repository mapping information. - In [`RuleConfiguredTargetBuilder#build()`], collect the `RunfilesLibraryUsersProvider` of all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage without requiring every rule implementation to cooperate. - Pass the `RepositoryMapping`s to `RunfilesSupport` and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. From 0952711cabd3bd17a5dc410804a7c32ff721f9da Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Wed, 10 Aug 2022 18:29:23 +0200 Subject: [PATCH 08/10] Add a further special case and fix the example --- ...022-07-21-locating-runfiles-with-bzlmod.md | 22 ++++--------------- 1 file changed, 4 insertions(+), 18 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 1dcb4b9..39bd340 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -76,7 +76,8 @@ The repository mapping manifest would then look as follows: 2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by `,`. Since neither canonical nor apparent repository names can contain commas, this is unambiguous. As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name (either the value of the `name` attribute of the `workspace` function or `__main__`) instead. - This is necessary since the main repository is stored under this name. + This is necessary since the main repository is stored under this name in the runfiles tree. + As a further special case, if `A` is the empty string, which happens only for the main workspace, the corresponding entry is skipped. 3. The lines of the manifest are sorted lexicographically. This ordering is equivalent to the lexicographical ordering of the triples comprising the repository mapping. 4. The repository mapping manifest for an executable target `T` consists only of those entries `(C, A, repo_mapping(C, A))` for which both of the following conditions are satisfied: @@ -114,36 +115,21 @@ cc_binary( ], deps = ["@bazel_tools//tools/cpp/runfiles"], ) -cc_binary( - name = "other_tool", - srcs = ["main.cpp"], - data = [ - "@my_protobuf//:protoc", - ], - deps = ["@bazel_tools//tools/cpp/runfiles"], -) ``` With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: ``` ,my_module,my_workspace -,my_workspace,my_workspace ,my_protobuf,@protobuf~3.19.2 +,my_workspace,my_workspace ``` * The manifest contains the canonical repository name for the workspace and module name since `//:some_tool` has a data dependency on `//:data.txt` and a direct dependency on a runfiles library. * The manifest contains the canonical repository name for protobuf since `//:some_tool` has a data dependency on `@my_protobuf//:protoc` and a direct dependency on a runfiles library. * The manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. * The manifest does *not* contain any entries referencing `rules_go`, even though the main module depends on it. `rules_go` neither contributes any runfiles to `//:some_tool` nor any target that depends on a runfiles library. - -The repository mapping manifest of `//:other_tool` would look as follows: - -``` -,my_protobuf,@protobuf~3.19.2 -``` - -This manifest does *not* contain any entries where `repo_mapping(C, A)` is `my_workspace` since the target does not include any runfiles from the main repository. +* The manifest does *not* contain any entries where `repo_mapping(C, A)` is `bazel_tool` since the target does not include any runfiles from that repository. If a module `other_module` depends on `my_module` and contains a target that depends on `@my_module//:some_tool`, then that target's repository mapping manifest would contain the following lines (among others): From e645fec144d8124bfae6660eed96cb211338ce0c Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Wed, 17 Aug 2022 18:05:58 +0200 Subject: [PATCH 09/10] Propose to reuse RunfilesProvider --- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 39bd340..38ad1d4 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -5,7 +5,7 @@ last updated: 2022-07-21 status: Draft reviewers: - @Wyverald - - @meteorcloudy + - @oquenchil - @phst title: Locating runfiles with Bzlmod authors: @@ -140,10 +140,11 @@ If a module `other_module` depends on `my_module` and contains a target that dep ### Implementation details -- Add a new internal `RunfilesLibraryUsersProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. - Adding a new provider is preferred over adding fields on e.g. the existing `RunfilesProvider` since the latter is controlled by rule implementations, but even non-cooperating rules need to forward and augment the repository mapping information. -- In [`RuleConfiguredTargetBuilder#build()`], collect the `RunfilesLibraryUsersProvider` of all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. - This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage without requiring every rule implementation to cooperate. +- Add a new field to `RunfilesProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. +- When creating an instance of `RunfilesProvider`, collect this information for all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. + This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage. + Note that this means that custom rules that never call `ctx.runfiles` but instead forward the runfiles from a dependency will not be accounted for if they also depend on a target advertising `RunfilesLibraryInfo`. + This is considered acceptable as the alternative would be to add an entirely new provider and forcibly populate it in `RuleConfiguredTargetBuilder` for something that is a very pathological edge case. - Pass the `RepositoryMapping`s to `RunfilesSupport` and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. This requires maintaining the inverse of the mapping `RepositoryMapping` in a `Multimap`: repository mappings are not necessarily injective. - Add the repository mapping manifest artifact to the runfiles middleman and the runfiles source manifest. From ae3f1fc55880806669641c5460d2696fc3c848e1 Mon Sep 17 00:00:00 2001 From: Fabian Meumertzheim Date: Wed, 17 Aug 2022 18:12:01 +0200 Subject: [PATCH 10/10] Mark as approved --- README.md | 2 +- designs/2022-07-21-locating-runfiles-with-bzlmod.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 285e3cb..3cf9e3e 100644 --- a/README.md +++ b/README.md @@ -44,12 +44,12 @@ list. | 2021-06-25 | [Better Starlark analysis unit tests](https://docs.google.com/document/d/12sC92mq7WasTvDWsm4U782oCegitfi1jbgOnMIGCe-A/edit?usp=sharing) | [@hvadehra](https://github.com/hvadehra) | Starlark | | 2022-04-25 | [Extend expand_template() API to reduce retained memory](https://docs.google.com/document/d/1Bjos2J0eLfxiy3_GU7LxOlciJAAqcbZCIIZ2VK0o5Wo/edit?usp=sharing) | [@hvadehra](https://github.com/hvadehra) | Starlark | | 2022-04-29 | [@since(version) annotations in documentation.](https://docs.google.com/document/d/17JBmzeY2NpCWbAGQjsqtmMRLOl0n8IcPYCMOYyFsG1k/edit#) | [@aiuto](https://github.com/aiuto) | Documentation | -| 2022-07-21 | [Locating runfiles with Bzlmod](designs/2022-07-21-locating-runfiles-with-bzlmod.md) | [@fmeum](https://github.com/fmeum) | External Repositories, Bazel | ### Approved | Last updated | Title | Author(s) alias | Category | | ------------ | ------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------- | +| 2022-08-17 | [Locating runfiles with Bzlmod](designs/2022-07-21-locating-runfiles-with-bzlmod.md) | [@fmeum](https://github.com/fmeum) | External Repositories, Bazel | | 2022-06-30 | [Credential Helpers for Bazel](designs/2022-06-07-bazel-credential-helpers.md) | [@Yannic](https://github.com/Yannic) | External Repositories | | 2022-05-25 | [Canonical label literals](https://docs.google.com/document/d/1N81qfCa8oskCk5LqTW-LNthy6EBrDot7bdUsjz6JFC4/edit#) | [@Wyverald](https://github/Wyverald) | External Repositories | | 2022-05-11 | [Deprecating managed_directories](https://docs.google.com/document/d/1u9V5RUc7i6Urh8gGfnSurxpWA7JMRtwCi1Pr5BHeE44/edit?usp=sharing) | [@lberki](https://github.com/lberki) | External Repositories | diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md index 38ad1d4..477a067 100644 --- a/designs/2022-07-21-locating-runfiles-with-bzlmod.md +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -2,7 +2,7 @@ --- created: 2022-07-21 last updated: 2022-07-21 -status: Draft +status: Approved reviewers: - @Wyverald - @oquenchil