Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pkg is a directory; dependency checking of directories is unsound #1408

Open
UebelAndre opened this issue Dec 14, 2023 · 17 comments
Open
Labels
documentation Improvements or additions to documentation help wanted Aspect isn't prioritizing this, but the community could

Comments

@UebelAndre
Copy link

What happened?

I have a dependencies on prettier@2.8.8 and after updating to the latest 1.34.1 and Bazel 7.0.0 I get the following error

WARNING: /home/user/code/BUILD.bazel:42:22 //:.aspects_rules_js/node_modules/prettier@2.2.8/pkg is a director; dependency checking of directories is unsound

Version

Development (host) and target OS/architectures:

Output of bazel --version: 7.0.0

How to reproduce

I'm not able to provide too much detail on my workspace but suspect anything which depends at least on `prettier==2.8.8` would repro this problem.

Any other information?

No response

@UebelAndre UebelAndre added the bug Something isn't working label Dec 14, 2023
@github-actions github-actions bot added the untriaged Requires traige label Dec 14, 2023
@IOrlandoni
Copy link

IOrlandoni commented Dec 17, 2023

You can also get the same warnings, on latest (1.34.1) and bazel 7.0.0, by using npm_link_all_packages.
That'll give you a warning per package.

@UebelAndre
Copy link
Author

Yup, I see that. Every package warns about this.

@alexeagle alexeagle changed the title [Bug]: pkg is a director; dependency checking of directories is unsound [Bug]: pkg is a directory; dependency checking of directories is unsound Dec 19, 2023
@alexeagle
Copy link
Member

alexeagle commented Dec 19, 2023

I believe Bazel 7 made this existing warning more noisy for some reason. It would be really useful if someone could bisect where this was introduced in Bazel, maybe that commit message will give us more clues about why it does this.

Adding this to .bazelrc makes the warning go away:

# Allow the Bazel server to check directory sources for changes. Ensures that the Bazel server
# notices when a directory changes, if you have a directory listed in the srcs of some target.
# Recommended when using
# [copy_directory](https://github.com/aspect-build/bazel-lib/blob/main/docs/copy_directory.md) and
# [rules_js](https://github.com/aspect-build/rules_js) since npm package are source directories
# inputs to copy_directory actions.
# Docs: https://bazel.build/reference/command-line-reference#flag--host_jvm_args
startup --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1

(from https://docs.aspect.build/guides/bazelrc#correctness-options )

@alexeagle alexeagle added documentation Improvements or additions to documentation help wanted Aspect isn't prioritizing this, but the community could and removed bug Something isn't working untriaged Requires traige labels Dec 19, 2023
@UebelAndre
Copy link
Author

Why not use globs instead? I’d rather not be relying on unsound behavior

@opicaud
Copy link

opicaud commented Dec 26, 2023

Hey,
There is an issue opened on this topic: bazelbuild/bazel#18646
I've tried with build --incompatible_disallow_unsound_directory_outputs=false but no success
The issue i have with this on my side, is that the cache is not hit anymore :-(

FYI i am using rules_js only to provide a macro to release component in my monorepo through semantic-release
I confirm that this behaviour was not present with bazel 6.4.0

@alexeagle
Copy link
Member

alexeagle commented Jan 18, 2024

@UebelAndre We don't want globs since that produces thousands of input files. We just want one input directory. It's Bazel's fault :)
@opicaud I think that's a different issue about directory outputs. This one is about directory inputs.

@UebelAndre
Copy link
Author

@UebelAndre We don't want globs since that produces thousands of input files. We just want one input directory. It's Bazel's fault :)

I just don't want this warning and don't want to be required to use some experimental behavior to make them go away :(

@alexeagle
Copy link
Member

It would be a massive performance regression to hand Bazel an input file for every file in the node_modules directory, so I don't think we'd consider turning that on for everyone. If you have a specific proposal of how to change it, I guess someone can make a PR on a fork of rules_js to bypass the warning.

@jesses-canva
Copy link

jesses-canva commented Jan 22, 2024

@alexeagle Have you considered downloading the archive with http_file/rctx.download and then doing the untar in a build action and consuming the untarred directory in downstream actions? So it stays as one input to downstream actions but isn't a source directory.

@SanjayVas
Copy link

SanjayVas commented Jan 31, 2024

The relevant commit to Bazel appears to be bazelbuild/bazel@2fe450f, referencing bazelbuild/bazel#18579

This does indeed appear to be also related to outputting directories.

Rules outputting directories (which is not sane, but unfortunately happens)

This implies that the copy_directory rule is itself not sane, as it is a rule that outputs a directory. The general pattern is to use archives (e.g. tar or jar) instead of directories, and extract as-needed downstream.

@jesses-canva
Copy link

@alexeagle Alternatively to avoid doing an untar in a build action, you could glob(["**/*"]) up all the inputs and use copy_to_directory to turn them into a single directory and expose that as the directory in node_modules.

@jesses-canva
Copy link

Rules outputting directories (which is not sane, but unfortunately happens)

@SanjayVas I believe this is referring to rules which do ctx.actions.declare_file during analysis but then create a directory during execution, such as the genrule in the example on that issue. See the later comments.

A directory properly declared with ctx.actions.declare_directory (as copy_to_directory does here) is well supported.

@Silic0nS0ldier
Copy link

@alexeagle Alternatively to avoid doing an untar in a build action, you could glob(["**/*"]) up all the inputs and use copy_to_directory to turn them into a single directory and expose that as the directory in node_modules.

If a performance hit is observed with this change, odds are they can be reclaimed by making changes to Bazel.

Analysis time should perform well (hashing of TreeArtifact instances should be cheaper vs. a plain list of files, it effectively caches a part of the computation).

"Preparation" tasks before execution such as runfiles creation might take a hit if Bazel is symlinking TreeArtifact files instead of the directory. If such a problem exists, it'll likely need an --incompatible_ change in Bazel.

@Silic0nS0ldier
Copy link

"Preparation" tasks before execution such as runfiles creation might take a hit if Bazel is symlinking TreeArtifact files instead of the directory. If such a problem exists, it'll likely need an --incompatible_ change in Bazel.

I tested this with an implementation of the suggested change. Bazel symlinks the TreeArtifact directory, so runtime performance seems to check out. Analysis should but will require benchmarking to confirm.

I'll get a PR for this.

@alexeagle
Copy link
Member

@jesses-canva that's a clever idea - but currently we need access to some files inside the package earlier (e.g. repository rule needs to read package.json) so it's not enough to delay extraction until an action runs. Perhaps we can extract just some of the files in a repository rule and ALSO extract them again in an action later.

Thanks for tracking down that change @SanjayVas - I'll reach out to @tjgq about our options here to keep the same performance but not trigger the warning.

@Silic0nS0ldier
Copy link

Perhaps we can extract just some of the files in a repository rule and ALSO extract them again in an action later.

Patches are applied before package.json is read (enables codegen for bin entries), so extracting just package.json wouldn't really work.

However it is possible to be more frugal. pnpm-lock.yaml will tell us if a package has a bin field.

# ...
packages:
  # ...
  /@aspect-test/a@5.0.2:
    resolution: {integrity: sha512-bURS+F0+tS2XPxUPbrqsTZxIre1U5ZglwzDqcOCrU7MbxuRrkO24hesgTMGJldCglwL/tiEGRlvdMndlPgRdNw==}
    hasBin: true
    # ^^^^^^^^^^
    dependencies:
      '@aspect-test/b': 5.0.2
      '@aspect-test/c': 2.0.2
      '@aspect-test/d': 2.0.0(@aspect-test/c@2.0.2)
    dev: true

This means for a typical scenario (out of all dependencies, few have hasBin: true).

  • Most packages don't need to be extracted during repository generation.
  • Extractions are cacheable.
    • Normal build: Cache pull and extraction should be able the same.
    • BWTB build: May only need to check cache if consumers are also cached (lots of factors to consider, but should be faster). In Bazel 7 BWTB is enabled by default when using remote caching/execution.
  • Packages with bin will need to be extracted twice, and have patches applied twice.

On paper performance should be close with potential for savings seen in certain scenarios (since extraction can potentially be skipped).

Stats for perspective (pulled from Rules JS pnpm-lock.yaml).

  • Packages with hasBin: true: 77
  • Total packages: 1014

If desiring to removal all extraction from repository generation, it may be doable with an API change (breaking change).

The generated lambdas and other symbols do not appear to need much information about the package. It may be possible to generalise them such that the user is responsible for specifying which binary to use (information which must already be known by users).

# From Rules JS, //e2e/bzlmod:BUILD.bazel
load("@npm//:jasmine/package_json.bzl", jasmine_bin = "bin")

jasmine_bin.test(
    name = "jasmine_test",
    bin = "jasmine",
    # ^^^^^^^^^^^^^^
    args = ["*.spec.js"],
    data = ["test.spec.js"],
    # jasmine doesn't know to run without runfiles
    target_compatible_with = not_windows,
)

With the user provided bin attribute (normally present in the macro name, e.g. jasmine_bin.jasmine_test) it can be used to construct the same values currently eagerly computed.

  • bin_name - No longer needed, is a santized version of the bin name that can be used for things like Starlark macro names.
  • bin_mnemonic - Capitalising each word, splitting on _.
  • bin_path - The value of package.json#bin[bin], normalised into a dict. No reason this cannot be resolved at runtime.

@gregmagolan
Copy link
Member

gregmagolan commented Apr 9, 2024

This has been "mostly" mitigated in the latest rules_js release now that for most npm package the input to the graph is the .tgz file and it is extracted into the package store.

See the PR summary of #1538 and this comment on #1412 for more info.

Since there are still cases where a source directory input is required for packages with lifecycle hooks and/or patches this issue should stay open. The lifecycle and patches cases will be fixed in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Aspect isn't prioritizing this, but the community could
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

8 participants