diff --git a/README.md b/README.md index 5a85fcb..3cf9e3e 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,7 @@ list. | Last updated | Title | Author(s) alias | Category | | ------------ | ------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------- | +| 2022-08-17 | [Locating runfiles with Bzlmod](designs/2022-07-21-locating-runfiles-with-bzlmod.md) | [@fmeum](https://github.com/fmeum) | External Repositories, Bazel | | 2022-06-30 | [Credential Helpers for Bazel](designs/2022-06-07-bazel-credential-helpers.md) | [@Yannic](https://github.com/Yannic) | External Repositories | | 2022-05-25 | [Canonical label literals](https://docs.google.com/document/d/1N81qfCa8oskCk5LqTW-LNthy6EBrDot7bdUsjz6JFC4/edit#) | [@Wyverald](https://github/Wyverald) | External Repositories | | 2022-05-11 | [Deprecating managed_directories](https://docs.google.com/document/d/1u9V5RUc7i6Urh8gGfnSurxpWA7JMRtwCi1Pr5BHeE44/edit?usp=sharing) | [@lberki](https://github.com/lberki) | External Repositories | diff --git a/designs/2022-07-21-locating-runfiles-with-bzlmod.md b/designs/2022-07-21-locating-runfiles-with-bzlmod.md new file mode 100644 index 0000000..477a067 --- /dev/null +++ b/designs/2022-07-21-locating-runfiles-with-bzlmod.md @@ -0,0 +1,235 @@ + +--- +created: 2022-07-21 +last updated: 2022-07-21 +status: Approved +reviewers: + - @Wyverald + - @oquenchil + - @phst +title: Locating runfiles with Bzlmod +authors: + - @fmeum +--- + +# Abstract + +With Bazel's new external dependency management system [Bzlmod](https://bazel.build/docs/bzlmod), the actual names of repositories under the output base are no longer static constants, but depend on the result of its dependency resolution algorithm. +As a consequence, functionality that relies on the static nature of these identifiers, most notably runfiles libraries and Stardoc, need to be made aware of the mapping between the names used to refer to repositories and their actual canonical name on disk. +This proposal suggests a way to serialize this information into a new type of manifest as well as changes to rulesets and runfiles libraries that allows end users to locate runfiles under Bzlmod with no or minimal changes to their existing code. + +# Background + +Bazel has had a concept of [repository mappings](https://bazel.build/docs/external#shadowing-dependencies) for a long time. +A repository can specify a mapping applied to the repository part of label strings used within this repository. +For example, a repository can specify that every label `@foo//:target` specified in it is actually interpreted as `@bar//:target`. +Here, `foo` is called the *apparent repository name* and `bar` is called the *canonical repository name*. +Repository mappings are rarely used in the `WORKSPACE` system for external dependency management and, crucially, all mapping entries are static. + +Bazel's new external dependency management system [Bzlmod](https://bazel.build/docs/bzlmod) implements "strict deps" checks for repositories by using repository mappings. +Every Bazel module depending on a Bazel module `foo` can specify its own apparent repository name for the repository generated by `foo`. +The canonical name of the repository corresponding to `foo` is an implementation detail of Bzlmod and typically of the form `@foo~1.0.3`, where `1.0.3` is the version of `foo`. +This version is determined by Bzlmod's dependency resolution algorithm and thus not known in advance by any of the modules participating in this resolution. +In contrast to the situation for `WORKSPACE` files, the canonical repository names thus aren't known by the Bazel modules themselves. + +The fact that canonical repository names have to be treated as dynamic with Bzlmod poses problems for at least two parts of Bazel that so far rely on repository names being statically known: + +1. If an executable target `@foo//:exe` in a Bazel module `foo` has a `data` dependency on a file target `@bar//pkg:runfile`, where `bar` is a Bazel module that is a direct dependency of `foo`, then the runfiles path of `@bar//:runfile` will be `/pkg/runfile`. + In order to locate the file at runtime, `@foo//:exe` needs to pass this path to one of the [runfiles libraries](https://docs.google.com/document/d/e/2PACX-1vSDIrFnFvEYhKsCMdGdD40wZRBX3m3aZ5HhVj4CtHPmiXKDCxioTUbYsDydjKtFDAzER5eg7OjJWs3V/pub). + This is a problem since the path contains the version of `bar`, which isn't static, and thus can't be referenced as a constant literal string anymore. + In fact, unless `foo` is the main module, the same problem already arises for files in `foo` itself. + This issue is tracked by [bazelbuild/bazel#15029](https://github.com/bazelbuild/bazel/issues/15029). +2. [Stardoc](https://github.com/bazelbuild/stardoc) generates documentation from `.bzl` files and for that has to resolve labels in `load` statements to `.bzl` files on disk. + Since it doesn't actually evaluate `WORKSPACE` or `MODULE.bazel` files, it has no way of discovering repository mappings in the same way as Bazel. + Thus, with Bzlmod, Stardoc fails to generate documentation for any `.bzl` file that loads `.bzl` files from other repositories. + This issue is tracked by [bazelbuild/bazel#14140](https://github.com/bazelbuild/bazel/issues/14140) and [bazelbuild/stardoc#117](https://github.com/bazelbuild/stardoc/issues/117). + +# Proposed changes + +## 1. Add a new provider that marks runfiles libraries + +Both Change 2 and 4 described in this proposal require Bazel targets that provide runfiles lookup functionality to be marked with a distinct provider `RunfilesLibraryInfo` exposed to Starlark. +The provider carries no data, it is purely used as a marker. + +A target that depends on a target advertising this provider will be given access to a serialized form of its repository mapping at runtime as described in Part 2. + +Furthermore, depending on the nature of the programming language of the target, the language-specific rules can use this information to generate code that allows the target compile-time access to its canonical repository name as described in Change 4. + +### Backwards compatibility + +Adding a new global symbol is a backwards-compatible change. + +In order to support Bzlmod, rulesets providing runfiles libraries will have to be updated to advertise the new provider. +Since a new global symbol can't be feature-detected in Starlark, this can only be done once the ruleset providing the runfiles libraries targets a Bazel version that offers the new provider as its minimum supported version. +For runfiles libraries that do not ship with Bazel, it would thus be helpful to cherry-pick this change into Bazel 5. + +## 2. Emit a repository mapping manifest for each executable target + +At runtime, an executable target has to be able to translate an apparent repository name to a canonical repository name. +This requires serializing the repository mapping of every transitive dependency of the target that may perform runfiles lookups to a file that can be accessed at runtime, similar to the existing runfiles manifest. + +In more formal terms, the complete repository mapping maintained by Bazel is a function `repo_mapping` that maps a pair `(C, A)` of a canonical repository name `C` and an apparent repository name `A` to the canonical repository name `repo_mapping(C, A)` to which `A` resolves when referred to from within `C`. + +The repository mapping manifest would then look as follows: + +1. The repository mapping manifest is an ASCII file named `.repo_mapping` and placed in the same directory as the target's executable. +2. Every triple `(C, A, repo_mapping(C, A))` is written as a single line terminated by `\n`, with the three components separated by `,`. + Since neither canonical nor apparent repository names can contain commas, this is unambiguous. + As a special case, if `repo_mapping(C, A)` is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name (either the value of the `name` attribute of the `workspace` function or `__main__`) instead. + This is necessary since the main repository is stored under this name in the runfiles tree. + As a further special case, if `A` is the empty string, which happens only for the main workspace, the corresponding entry is skipped. +3. The lines of the manifest are sorted lexicographically. + This ordering is equivalent to the lexicographical ordering of the triples comprising the repository mapping. +4. The repository mapping manifest for an executable target `T` consists only of those entries `(C, A, repo_mapping(C, A))` for which both of the following conditions are satisfied: + - `T` transitively depends on a target in `C` that directly depends on a target advertising `RunfilesLibraryInfo`; + - `T`'s runfiles contain an artifact owned by a target in `repo_mapping(C, A)`. + This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. + If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor looks up runfiles at runtime. + +In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path `_repo_mapping`. +Since canonical repository names do not start with `_`, there is no risk of this synthetic part being a prefix of an actual runfile. + +### Example + +Consider the following Bazel module: + +```starlark +# WORKSPACE +workspace(name = "my_workspace") + +# MODULE.bazel +module( + name = "my_module", + version = "1.2.3", +) +bazel_dep(name = "protobuf", version = "3.19.2", repo_name = "my_protobuf") +bazel_dep(name = "rules_go", version = "0.33.0") + +# BUILD.bazel +cc_binary( + name = "some_tool", + srcs = ["main.cpp"], + data = [ + "data.txt", + "@my_protobuf//:protoc", + ], + deps = ["@bazel_tools//tools/cpp/runfiles"], +) +``` + +With `my_module` as the root module and Bzlmod enabled, the repository mapping manifest of `//:some_tool` would look as follows: + +``` +,my_module,my_workspace +,my_protobuf,@protobuf~3.19.2 +,my_workspace,my_workspace +``` + +* The manifest contains the canonical repository name for the workspace and module name since `//:some_tool` has a data dependency on `//:data.txt` and a direct dependency on a runfiles library. +* The manifest contains the canonical repository name for protobuf since `//:some_tool` has a data dependency on `@my_protobuf//:protoc` and a direct dependency on a runfiles library. +* The manifest does *not* contain any entries where `C` is `@protobuf~3.19.2` since the `//:protoc` target does not transitively depend on a runfiles library. +* The manifest does *not* contain any entries referencing `rules_go`, even though the main module depends on it. `rules_go` neither contributes any runfiles to `//:some_tool` nor any target that depends on a runfiles library. +* The manifest does *not* contain any entries where `repo_mapping(C, A)` is `bazel_tool` since the target does not include any runfiles from that repository. + +If a module `other_module` depends on `my_module` and contains a target that depends on `@my_module//:some_tool`, then that target's repository mapping manifest would contain the following lines (among others): + +``` +@my_module~1.2.3,my_module,@my_module~1.2.3 +@my_module~1.2.3,my_protobuf,@protobuf~3.19.2 +``` + +### Implementation details + +- Add a new field to `RunfilesProvider` to track the `RepositoryName` and `RepositoryMapping` of transitive dependencies of a given target that directly depend on a runfiles library. +- When creating an instance of `RunfilesProvider`, collect this information for all non-implicit, non-tool dependencies and add the `RepositoryName` and `RepositoryMapping` of the current target if any such dependency advertises `RunfilesLibraryInfo`. + This is very similar to the logic in [`InstrumentedFilesCollector#forwardAll`], which forwards information about source files instrumented for coverage. + Note that this means that custom rules that never call `ctx.runfiles` but instead forward the runfiles from a dependency will not be accounted for if they also depend on a target advertising `RunfilesLibraryInfo`. + This is considered acceptable as the alternative would be to add an entirely new provider and forcibly populate it in `RuleConfiguredTargetBuilder` for something that is a very pathological edge case. +- Pass the `RepositoryMapping`s to `RunfilesSupport` and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. + This requires maintaining the inverse of the mapping `RepositoryMapping` in a `Multimap`: repository mappings are not necessarily injective. +- Add the repository mapping manifest artifact to the runfiles middleman and the runfiles source manifest. + +### Backwards compatibility + +Emitting a new file into the runfiles tree is a backwards-compatible change. + +## 3. Extend the runfiles libraries to take repository mappings into account + +Based on Change 2, runfiles libraries can now apply the repository mapping at runtime. +Concretely, runfiles libraries should modify their current lookup procedure described in the [original design doc](https://docs.google.com/document/d/e/2PACX-1vSDIrFnFvEYhKsCMdGdD40wZRBX3m3aZ5HhVj4CtHPmiXKDCxioTUbYsDydjKtFDAzER5eg7OjJWs3V/pub) as follows: + +1. After the `Create` method of the runfiles library has located either the runfiles directory or the runfiles manifest, it uses either to look up the repository mapping manifest under the fixed runfiles path `_repo_mapping`. + + a. If the runfiles library is shipped with Bazel, it should fail with a descriptive error message if the manifest is not found. + + b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest. + +2. The runfiles library parses the repository mapping manifest and stores a map of `(C, A)` to `repo_mapping(C, A)` for all entries in the manifest. + +3. When `Rlocation` is called with a runfiles-root-relative path `rpath`, after validating `rpath`, the runfiles library extracts the first path segment, understood to be an apparent repository name `A`, of `rpath`. It also determines `C`, the canonical name of the repository from which it was called, in a language-specific way (see Change 4 below suggestions). If `C` can't be determined, the runfiles library sets `C` to the empty string, which corresponds to performing runfiles lookups using the mapping of the main repository. It then looks up the map entry for the key `(C, A)` created in Step 2. + + a. If `(C, A)` is contained in the map, the runfiles library continues the original lookup procedure for the modified path `repo_mapping(C, A) + rpath[rpath.find('/'):]`. + + b. If `(C, A)` is not contained in the map, the runfiles library continues the original lookup procedure for the unmodified path `rpath`. + +The precise way in which this procedure is implemented depends on the particular implementation of the runfiles library for each language. +For example, the runfiles library for Bash doesn't offer a `Create` method and implicitly looks up the runfiles library or manifest on each invocation of the `rlocation` function. +The implementation details will have to be handled by the individual libraries' maintainers. + +Step 2 falls back to using the empty string as the canonical repository name of the current repository so that existing code performing runfiles lookup can be kept working to the extent possible: +While this will not lead to successful runfiles lookups from modules that are used as dependencies of other modules, it will correctly resolve in the typical case of ruleset end users that only look up files in their main repository or direct dependencies of that repository. +It is possible that this behavior leads to cases where a runfiles lookup succeeds, but resolves to a different file than intended, for runfiles libraries usages by dependencies that haven't been updated. +The author believes that the risk of this happening in practice is outweighed by the chance to remain compatible with existing end user codebases. + +Step 3 a) falls back to looking up the unmodified runfiles path `rpath` so that alternative approaches for generating post-repo-mapping runfiles paths remain viable. +This includes: + +- Passing in runfiles paths from Starlark via environment variables or arguments. +- Generating code constants containing already repo-mapped runfiles paths (see [rules_runfiles](https://github.com/fmeum/rules_runfiles)). + +This fallback behavior is not prone to ambiguity: With Bzlmod, canonical names of non-main repositories always starts with an `@`, which is never contained in a valid apparent repository name. + +### Backwards compatibility + +End users that do not use Bzlmod will experience no changes when using updated runfiles libraries. + +End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern. + +# Suggested follow-up changes + +The following changes solve the two problems described in the background section of this document based on Change 1, 2, and 3 described above. +However, due to the federated nature of rulesets, which in many cases aren't even released with Bazel, and the wildly different natures of the languages they cover, these changes are at this point only to be treated as suggestions. +Further feedback from ruleset and Stardoc maintainers as well as language experts is required to ensure that these changes actually provide the best possible API and work correctly in all cases. + +## 4. Make the canonical repository name of the current repository available at runtime + +Using the new features described in Part 2 and 3, repository mapping aware runfiles lookups can be performed knowing the canonical repository name of the repository containing the code that performs the lookup. +The canonical repository name is not an absolute constant known to the end user --- it is an implementation detail of Bzlmod and e.g. contains module versions that may change depending on the outcome of the dependency resolution algorithm. + +For compiled languages, some amount of language-specific code generation will likely be needed to make that information available at runtime. +Ideally, this code generation would be performed by the language-specific rules themselves if and only if a direct dependency of a target advertises the `RunfilesLibraryInfo` provider. +For example, the following changes could be made to the rules for compiled languages shipped with Bazel: + +- **C/C++**: If a `cc_*` target has a direct dependency advertising `RunfilesLibraryInfo`, implicitly add a `local_define` of `BAZEL_CURRENT_REPOSITORY_NAME` with value the canonical repository name of the repository containing the target. Then, require users to pass `BAZEL_CURRENT_REPOSITORY_NAME` as an argument to the runfiles library's `Create` method. +- **Java**: If a `java_*` target has a direct dependency advertising `RunfilesLibraryInfo`, generate a source file providing a `static final String` constant `com.google.devtools.build.runfiles.RunfilesHelper.CURRENT_REPOSITORY_NAME` with value the canonical repository name of the repository containing the target. + Separately compile this source file with `neverlink = True` and implicitly add it as compile-time dependency of the target. + Since the Java Language Specification mandates that [`static` constants are inlined](https://docs.oracle.com/javase/specs/jls/se17/html/jls-14.html#d5e26341), this will ensure that no reference to the `RunfilesHelper` class is emitted into the compilation output of the target. + Each target can then reference the same `CURRENT_REPOSITORY_NAME` constant and pass it to the `Create` method of `Runfiles`. + +For other compiled as well as dynamic languages, the canonical repository name may instead be obtained directly by the runfiles library by parsing the path of the calling source file. For example, Go has [runtime.Caller](https://pkg.go.dev/runtime#Caller), Python has `inspect.getframeinfo(sys._getframe(1)).filename` and Bash has `caller`. + +## 5. Make Stardoc compatible with repository mappings + +Using the new functionality suggested by this proposal, Stardoc can be made aware of repository mappings as follows: + +1. Add an implicit dependency on a global helper target advertising `RunfilesLibraryInfo` to `bzl_library`. +2. Create a new `stardoc_runfiles` rule that consumes `bzl_library` targets and advertises a `DefaultInfo` with all transitive `.bzl` files in `runfiles` and a no-op executable in `executable`. +3. Replace the current `stardoc` rule with a macro that evaluates to both a `stardoc_runfiles` and a `stardoc` target, with the latter referencing the former and adding it to the `tools` of the actual Stardoc action. + This stages all `.bzl` files as runfiles and, due to 1., ensures that they are covered by the generated `.repo_mapping` manifest. +4. At runtime, parse the runfiles path of a root `.bzl` file to obtain its canonical repository name and use the Java runfiles library to look up the paths of dependencies according to this repository's repository mapping. + +Steps 2 and 3 could alternatively be replaced by implementing and using one of the new features proposed in [bazelbuild/bazel#15486](https://github.com/bazelbuild/bazel/issues/15486). + +[RunfilesHelper]: https://github.com/fmeum/bazel/commit/fdbc85448553ef46e620bbfc09cb3c11b7e05425#diff-75515109d2df88a368ed059a7ad73a424d0bd1db785d1939f41531266b7769b2 +[`InstrumentedFilesCollector#forwardAll`]: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/analysis/test/InstrumentedFilesCollector.java;l=65;drc=86b800e4066389d4535fbdf280cfe9447b06df15;bpv=1;bpt=1 +[`RuleConfiguredTargetBuilder#build()`]: https://cs.opensource.google/bazel/bazel/+/86b800e4066389d4535fbdf280cfe9447b06df15:src/main/java/com/google/devtools/build/lib/analysis/RuleConfiguredTargetBuilder.java;l=105;bpv=1;bpt=1