Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moduleFileHash in MODULE.bazel.lock causes frequent Git merge conflicts #20369

Closed
linzhp opened this issue Nov 29, 2023 · 5 comments
Closed

moduleFileHash in MODULE.bazel.lock causes frequent Git merge conflicts #20369

linzhp opened this issue Nov 29, 2023 · 5 comments
Assignees
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@linzhp
Copy link
Contributor

linzhp commented Nov 29, 2023

Description of the bug:

The value of moduleFileHash changes every time any dependency in the whole repo is changed. This is not scalable in a monorepo with thousands of developers committing concurrently. Imagine the scenario:

From base revision with "moduleFileHash": "x"

Developer 1 upgrades library a, which produced this as part of the diff:

-   "moduleFileHash": "x",
+   "moduleFileHash": "y",

Meanwhile, developer 2 upgrades another library b, which is totally unrelated to a, and produce this:

-   "moduleFileHash": "x",
+   "moduleFileHash": "z",

If developer 1 landed her diff first, developer 2 won't be able to land hers due to Git merge conflict. So developer 2 will have to rebase onto the main branch and update her diff to be

-   "moduleFileHash": "y",
+   "moduleFileHash": "aa",

While she went for a coffee break, developer 3 landed yet another diff that upgrade another unrelated library, causing moduleFileHash to change:

-   "moduleFileHash": "y",
+   "moduleFileHash": "bb",

Now developer 2 has to rebase again... In the end, developer 2 has to yell to all developers in this big corporation: STOP UPGRADING ANY DEPENDENCIES BEFORE I LAND MY CHANGE

Which category does this issue belong to?

External Dependency

@iancha1992 iancha1992 added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. area-Bzlmod Bzlmod-specific PRs, issues, and feature requests labels Nov 29, 2023
@Wyverald
Copy link
Member

This does sound like a very probable scenario, though I'm drawing a blank on how it should be resolved. We need the module file hash in the lockfile to know whether the locked stuff is up-to-date.

Maybe if we had segmented MODULE.bazel files (#17880) and recorded the hash of each segment individually? Do you reckon that's enough?

@fmeum
Copy link
Collaborator

fmeum commented Nov 29, 2023

Options I see:

  1. Replace the hash with a canonicalized JSON description of the module file contents that are relevant for module resolution (roughly bazel_deps but not use_repos). While that does make conflicts more friendly as they aren't concentrated on a single line, it's still hard for humans to correctly resolve the conflict and not introduce subtle mistakes.

  2. Provide tooling that can serve as a custom git merge driver and automatically resolves these conflicts. I don't know whether git allows the driver for a particular file to use contents from another file (MODULE.bazel) though, which would be required to do this fully automatically. If that's not possible, it would require a custom step (e.g. a CI action that triggers on rebases).

  3. The most drastic one: Replace the module resolution part of the lockfile with a go.sum-style append-only lock mechanism. Simply record the hash of the module, source and patch files of each module involved during resolution and rely on the repository cache to provide offline access to these files given their hashes. Since module resolution is deterministic, this is sufficient to reconstruct the entire dependency tree including module extension usages. Resolving a conflict by keeping all the lines will always be safe and a command such as bazel mod tidy could be used to clean up unused hashes.

In fact, 3 is the only option I can think of that realizes "trust on first use" semantics for module files: With the current format, every merge conflict or change to the module file followed by a server restart will fetch module files again, relying on our promise of their immutability to ensure nobody has tampered with them.

@linzhp
Copy link
Contributor Author

linzhp commented Nov 29, 2023

I like 3 best. We rarely run into merge conflicts in go.sum in Uber's Go monorepo because there is no check sum for the whole file and because it is sorted

@fmeum
Copy link
Collaborator

fmeum commented Nov 30, 2023

An example of the format I envision, borrowing heavily from go.sum:

{
  "lockfileVersion": 5,
  "moduleHashes": {
    "rules_bar@2.3.4/MODULE.bazel": "sha256-...",
    "rules_bar@2.3.4/source.json": "sha256-...",
    "rules_foo@1.2.3/MODULE.bazel": "sha256-...",
    "rules_foo@1.2.3/source.json": "sha256-...",
    "rules_foo@1.2.4/MODULE.bazel": "sha256-...",
    "rules_foo@1.2.4/source.json": "sha256-..."
  },
  "extensionResults": {
    "@@rules_foo~1.2.3//extensions:my_ext.bzl%my_ext": {
      "sha256-ABC..": {
        "fileDeps": [
          "@@//:foo.lock"
        ],
        "generatedRepoSpecs": {
          ...
        }
      },
      "sha256-CDE..": {
        "fileDeps": [],
        "generatedRepoSpecs": {
          ...
        }
      }
    }
  }
}

The hashes in the keys of the inner extensionResults map are opaque hashes of:

  • the transitive .bzl files
  • the statically declared os/arch requirements
  • the statically declared environment variables
  • the accumulated extension usages
  • the accumulated file deps and their hashes

This format has a few new properties:

  • Merge conflicts should be more rare since they can only occur if extensions or module dependencies that are next to each other in sorted order are affected by the change.
    Isolated extension usages can help make this even less likely.
  • Merge conflicts can always be resolved by an appropriate deep JSON merge that is shallow on generatedRepoSpecs and fileDeps.
    If an extension produces different results with the same inputs on different branches due to non-hermeticity, choosing either of those should be fine.
  • When a dependency is updated, the lockfile still provides useful information to speed up the resolution step (by using locally cached registry files) and to ensure that modules don't change after they are first encountered (trust on first use).
  • We don't have to include hashes for bazel_tools or local overrides.
  • This performs worse on CI runners with no repository cache. Alternatively, we could maintain a directory in the source tree that stores the module and source.json files.
  • Due to the "append-only" storage for extension results, this can mitigate the problem of unstable extension hashes: Each of them will get its own separate entry.
  • Due to the use of opaque hashes for extension results, error messages on out-of-date lockfiles with --lockfile_mode=error can be less helpful. We could fix this by adding more clear-text information to the results.
  • Lockfiles can grow over time (similar to go.sum), which also slows down the linear search over the fileDeps, so we need some kind of clean up logic.
    This is straightforward for the module part, but less so for the extension results.

I haven't fully thought this through yet and may have missed crucial cons.

@meteorcloudy meteorcloudy added type: feature request P2 We'll consider working on this in future. (Assignee optional) and removed type: bug untriaged labels Dec 5, 2023
aignas added a commit to aignas/rules_python that referenced this issue Jan 10, 2024
github-merge-queue bot pushed a commit to bazelbuild/rules_python that referenced this issue Jan 13, 2024
When we use the latest `bazel` release in the `rules_python`
module, then the lock file gets created, but I am not sure if
we should commit it in. Since the lock file is very useful
for local development as it speeds up dependency refetching,
this PR disables it as advised in the issue below. Whilst at
it, the `bzlmod` example also removes it for smaller diffs
when we develop extensions.

See bazelbuild/bazel#20369
fmeum added a commit to fmeum/bazel that referenced this issue Feb 19, 2024
fmeum added a commit to fmeum/bazel that referenced this issue Feb 19, 2024
fmeum added a commit to fmeum/bazel that referenced this issue Feb 19, 2024
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Apr 29, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Apr 30, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
copybara-service bot pushed a commit that referenced this issue May 6, 2024
`MODULE.bazel`, `bazel_registry.json` and `source.json` files obtained from remote registries are stored in the repository cache and their hashes are collected in the lockfile. This speeds up incremental module resolutions, such as after adding a new `bazel_dep`.

Yanked versions are not stored in the lockfile. Their handling will be part of a follow-up PR.

Implements part of https://docs.google.com/document/d/1TjA7-M5njkI1F38IC0pm305S9EOmxcUwaCIvaSmansg/edit
Work towards #20369

Closes #21901.

PiperOrigin-RevId: 631195852
Change-Id: I35c30af7f9c3626bdbcb04c85b8c2502eeaafd3e
bazel-io pushed a commit to bazel-io/bazel that referenced this issue May 6, 2024
This effectively reverts most of 2a2a474: Module files of yanked versions are evaluated just like those of any other versions and "yankedness" is only checked for the final dep graph after selection.

This greatly simplifies incremental fetching of (inherently mutable) yanked version information with the new lockfile format.

Work towards bazelbuild#20369

RELNOTES: `print` statements in module files are now only executed for the root module and modules subject to non-registry overrides (e.g. `local_path_override`).

Closes bazelbuild#22083.

PiperOrigin-RevId: 627953972
Change-Id: Ie0aba02d187e000450a89ad2cd281c173582880a
Wyverald pushed a commit that referenced this issue May 6, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards #20369

Closes #22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
github-merge-queue bot pushed a commit that referenced this issue May 7, 2024
This effectively reverts most of
2a2a474: Module files of yanked
versions are evaluated just like those of any other versions and
"yankedness" is only checked for the final dep graph after selection.

This greatly simplifies incremental fetching of (inherently mutable)
yanked version information with the new lockfile format.

Work towards #20369

RELNOTES: `print` statements in module files are now only executed for
the root module and modules subject to non-registry overrides (e.g.
`local_path_override`).

Closes #22083.

PiperOrigin-RevId: 627953972
Change-Id: Ie0aba02d187e000450a89ad2cd281c173582880a

Commit
45982b4

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
github-merge-queue bot pushed a commit that referenced this issue May 7, 2024
Retaining `Postable`s across the graph increases the memory usage of
every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done"
Skyframe nodes for extension evaluation in
`BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in
`SingleExtensionEvalValue`. The validation of repos imported from an
extension is moved into a separate `SkyFunction`. This simplifies `bazel
mod tidy`, which no longer needs to swallow certain exceptions, and
allows cache hits when only the (invalid) imports of an extension are
modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become
obsolete with the new lockfile format.

Work towards #20369

Closes #22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272

---------

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Wyverald pushed a commit that referenced this issue May 8, 2024
`MODULE.bazel`, `bazel_registry.json` and `source.json` files obtained from remote registries are stored in the repository cache and their hashes are collected in the lockfile. This speeds up incremental module resolutions, such as after adding a new `bazel_dep`.

Yanked versions are not stored in the lockfile. Their handling will be part of a follow-up PR.

Implements part of https://docs.google.com/document/d/1TjA7-M5njkI1F38IC0pm305S9EOmxcUwaCIvaSmansg/edit
Work towards #20369

Closes #21901.

PiperOrigin-RevId: 631195852
Change-Id: I35c30af7f9c3626bdbcb04c85b8c2502eeaafd3e
github-merge-queue bot pushed a commit that referenced this issue May 8, 2024
`MODULE.bazel`, `bazel_registry.json` and `source.json` files obtained
from remote registries are stored in the repository cache and their
hashes are collected in the lockfile. This speeds up incremental module
resolutions, such as after adding a new `bazel_dep`.

Yanked versions are not stored in the lockfile. Their handling will be
part of a follow-up PR.

Implements part of
https://docs.google.com/document/d/1TjA7-M5njkI1F38IC0pm305S9EOmxcUwaCIvaSmansg/edit
Work towards #20369

Closes #21901.

PiperOrigin-RevId: 631195852
Change-Id: I35c30af7f9c3626bdbcb04c85b8c2502eeaafd3e

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
This is in preparation for tracking the hashes of remote (non-`file:` URL) files in the lockfile. If the tests use local registries, they wouldn't be representative of the default situation.

Work towards bazelbuild#20369

Closes bazelbuild#21906.

PiperOrigin-RevId: 622290283
Change-Id: Ibe825d2bede84c1b0672dbb699aaf3ee5168a813
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
Ensures that yanked version information is only downloaded once per module name, which reduces the number of files fetched during module resolution. On my machine, this reduces `bazel mod deps` runtime on Bazel itself by ~15%.

This also prepares for storing yanked version information in the lockfile.

Work towards bazelbuild#20369

Closes bazelbuild#21909.

PiperOrigin-RevId: 625617556
Change-Id: Ie4184def3b868470f93d584bbbbd7f704e1e9a82
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
This gets rid of an ad-hoc cache maintained in `RegistryFactoryImpl` and prepares for making registries aware of hashes stored in the lockfile.

Work towards bazelbuild#20369

Closes bazelbuild#22040.

PiperOrigin-RevId: 626307340
Change-Id: I34b428553f7169c72ed7dddd2fe3ea5e6dca2a97
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
This effectively reverts most of 2a2a474: Module files of yanked versions are evaluated just like those of any other versions and "yankedness" is only checked for the final dep graph after selection.

This greatly simplifies incremental fetching of (inherently mutable) yanked version information with the new lockfile format.

Work towards bazelbuild#20369

RELNOTES: `print` statements in module files are now only executed for the root module and modules subject to non-registry overrides (e.g. `local_path_override`).

Closes bazelbuild#22083.

PiperOrigin-RevId: 627953972
Change-Id: Ie0aba02d187e000450a89ad2cd281c173582880a
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
Retaining `Postable`s across the graph increases the memory usage of every dependent Skyframe node.

`ModuleExtensionResolutionEvent` is replaced with a lookup of the "done" Skyframe nodes for extension evaluation in `BazelLockFileModule#afterCommand`.

`RootModuleFileFixupEvent` is replaced with direct storage in `SingleExtensionEvalValue`. The validation of repos imported from an extension is moved into a separate `SkyFunction`. This simplifies `bazel mod tidy`, which no longer needs to swallow certain exceptions, and allows cache hits when only the (invalid) imports of an extension are modified even without the lockfile.

The `BazelModuleResolutionEvent` is kept for now, but will become obsolete with the new lockfile format.

Work towards bazelbuild#20369

Closes bazelbuild#22058.

PiperOrigin-RevId: 628213907
Change-Id: I8ba19f5151a8183be5051c8d9280f93476db2272
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
`MODULE.bazel`, `bazel_registry.json` and `source.json` files obtained from remote registries are stored in the repository cache and their hashes are collected in the lockfile. This speeds up incremental module resolutions, such as after adding a new `bazel_dep`.

Yanked versions are not stored in the lockfile. Their handling will be part of a follow-up PR.

Implements part of https://docs.google.com/document/d/1TjA7-M5njkI1F38IC0pm305S9EOmxcUwaCIvaSmansg/edit
Work towards bazelbuild#20369

Closes bazelbuild#21901.

PiperOrigin-RevId: 631195852
Change-Id: I35c30af7f9c3626bdbcb04c85b8c2502eeaafd3e
Wyverald pushed a commit that referenced this issue May 13, 2024
The old lockfile fields and all code related to it as well as the workarounds for local path inclusions are removed.

The distribution archive lockfile needs to be checked in temporarily as the main `MODULE.bazel.lock` file does not contain the hashes for the registry files.

Fixes #19621
Fixes #20369

RELNOTES: The format for MODULE.bazel.lock is now less likely to result in merge conflicts and is updated incrementally, with only new files downloaded from registries and existing ones taken from the repository cache (if configured).

Closes #22154.

PiperOrigin-RevId: 633233519
Change-Id: Ie2c3042e4141a36e472b2c25cbd67be4aad096a1
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.2.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.2.0rc1. Thanks!

malt3 added a commit to edgelesssys/constellation that referenced this issue May 22, 2024
The lockfile in it's current form is prone to generate complex merge conflicts.
See bazelbuild/bazel#20369
Also, the bazel central registry never removes or alters existing versions.
The lockfile can probably be added to Git in Bazel >= 7.2
malt3 added a commit to edgelesssys/constellation that referenced this issue May 22, 2024
The lockfile in it's current form is prone to generate complex merge conflicts.
See bazelbuild/bazel#20369
Also, the bazel central registry never removes or alters existing versions.
The lockfile can probably be added to Git in Bazel >= 7.2
malt3 added a commit to edgelesssys/constellation that referenced this issue May 22, 2024
The lockfile in it's current form is prone to generate complex merge conflicts.
See bazelbuild/bazel#20369
Also, the bazel central registry never removes or alters existing versions.
The lockfile can probably be added to Git in Bazel >= 7.2
malt3 added a commit to edgelesssys/constellation that referenced this issue May 22, 2024
The lockfile in it's current form is prone to generate complex merge conflicts.
See bazelbuild/bazel#20369
Also, the bazel central registry never removes or alters existing versions.
The lockfile can probably be added to Git in Bazel >= 7.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants