Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repo_mapping file is generated by conflicting actions #18666

Closed
keith opened this issue Jun 13, 2023 · 16 comments
Closed

repo_mapping file is generated by conflicting actions #18666

keith opened this issue Jun 13, 2023 · 16 comments
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug untriaged

Comments

@keith
Copy link
Member

keith commented Jun 13, 2023

Description of the bug:

ERROR: file 'app.release_deps_lint_test_script.repo_mapping' is generated by these conflicting actions:
Label: //:app.release_deps_lint
RuleClass: apple_verification_test rule
JavaActionClass: class com.google.devtools.build.lib.analysis.RepoMappingManifestAction
Configuration: b20c4f686aacafa27460b3c60d2f9d88d1c7b116c35cf69d3e32d36fe70be31f, 61221742bd234fe9c9154be231196cc1037b1df27cd5fe89863d6fc3faf3b1c7
Mnemonic: RepoMappingManifest
Action key: 6506cd03fbf9ab0f9aa39694cd03e9d4aba8780eabd335df9ad2fb02890234ef, a27b7329291cba2e8dac45a3e4f78183e40a5bbc6a632d209e31e5e4971b629a
Progress message: Writing repo mapping manifest for //:app.release_deps_lint
Action describeKey: (null)
PrimaryInput: (null)
PrimaryOutput: File:[[<execution_root>]bazel-out/darwin_arm64-fastbuild/bin]app.release_deps_lint_test_script.repo_mapping
Owner information: RealConfiguredTargetKey{label=//:app.release_deps_lint, config=BuildConfigurationKey[b20c4f686aacafa27460b3c60d2f9d88d1c7b116c35cf69d3e32d36fe70be31f]}, RealConfiguredTargetKey{label=//:app.release_deps_lint, config=BuildConfigurationKey[61221742bd234fe9c9154be231196cc1037b1df27cd5fe89863d6fc3faf3b1c7]}
MandatoryInputs: are equal
Outputs: are equal
ERROR: com.google.devtools.build.lib.actions.MutableActionGraph$ActionConflictException: for app.release_deps_lint_test_script.repo_mapping, previous action: action 'Writing repo mapping manifest for //:app.release_deps_lint', attempted action: action 'Writing repo mapping manifest for //:app.release_deps_lint'

This error is produced intermittently in our project. In the failure case the diff of the configurations mentioned in above is:

INFO: Displaying diff between configs 1aaa43b86dc7ca2975461a460eea09f521c55088f3bcf51acc405cff20cc7078 and df2789a14be991566c2136d5d6ed0f14452c03a0f9e777baf1b4948388ada537
Displaying diff between configs 1aaa43b86dc7ca2975461a460eea09f521c55088f3bcf51acc405cff20cc7078 and df2789a14be991566c2136d5d6ed0f14452c03a0f9e777baf1b4948388ada537
FragmentOptions com.google.devtools.build.lib.analysis.test.TestConfiguration$TestOptions {
  cache_test_results: NO, null
  coverage_support: @bazel_tools//tools/test:coverage_support, null
  experimental_cancel_concurrent_tests: false, null
  experimental_fetch_all_coverage_outputs: false, null
  experimental_retain_test_configuration_across_testonly: false, null
  experimental_split_coverage_postprocessing: false, null
  incompatible_check_sharding_support: false, null
  incompatible_exclusive_test_sandboxed: true, null
  runs_per_test: [(?:(?>.*)) Options: [1]], null
  runs_per_test_detects_flakes: false, null
  test_arg: [], null
  test_filter: null, null
  test_result_expiration: -1, null
  test_runner_fail_fast: false, null
  test_sharding_strategy: EXPLICIT, null
  test_timeout: {short=PT1M, moderate=PT5M, long=PT15M, eternal=PT1H}, null
  trim_test_configuration: true, null
  use_target_platform_for_tests: false, null
  zip_undeclared_test_outputs: true, null
}

It seems like passing --trim_test_configuration=false works around this issue. I believe this was introduced in 698a514 but it still repros on HEAD today.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This can be reproduced on macOS with this project: repro.zip by running ./repro.sh. It cleans and builds multiple times since the issue is intermittent.

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

698a514

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

bazelisk

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@keith
Copy link
Member Author

keith commented Jun 13, 2023

cc @sdtwigg

@Pavank1992 Pavank1992 added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Jun 14, 2023
@fmeum
Copy link
Collaborator

fmeum commented Jun 14, 2023

I added an implementation of describeKey after applying the fix in #18668. I attached the result:
first_action.txt
second_action.txt

You can see that the order of elements in the transitivePackages set differs between the two instances of the action. This didn't matter before 698a514 as we used to precompute and sort the repo mapping manifest contents for the action key computation.

@lberki Are you aware of any cases in which the order of elements in transitivePackages could be unstable, in particular related to --trim_test_configuration? If I rerun the repo multiple times, sometimes the effective order of set elements is even the same, but the nested set structure still differs.

Although the question remains why there are two separate actions registered in the first place.

@lberki
Copy link
Contributor

lberki commented Jun 14, 2023

Ouch, this is not great. My guess is that the non-determinism in transitivePackages is because it's computed from ConfiguredTargetAndDataProducer.acceptValueOrException2(), and which in turn is called from the complicated state machine in ConfiguredTargetFunction and friends, and I would not be surprised at all if the order it's called on direct dependencies depended on the exact order the dependent SkyValues are produced. I don't have all that machinery in my head at the moment, though. @greatfilter, WDYT?

@fmeum
Copy link
Collaborator

fmeum commented Jun 14, 2023

I can confirm that the issue no longer reproduces for me if I enforce deterministic sorting of transitivePackages with this (hackish) patch:

--- a/src/main/java/com/google/devtools/build/lib/analysis/producers/TransitiveDependencyState.java
+++ b/src/main/java/com/google/devtools/build/lib/analysis/producers/TransitiveDependencyState.java
@@ -13,17 +13,27 @@
 // limitations under the License.
 package com.google.devtools.build.lib.analysis.producers;
 
+import static java.util.Comparator.comparing;
+
 import com.google.devtools.build.lib.analysis.ConfiguredTargetValue;
 import com.google.devtools.build.lib.causes.Cause;
 import com.google.devtools.build.lib.cmdline.PackageIdentifier;
 import com.google.devtools.build.lib.collect.nestedset.NestedSet;
 import com.google.devtools.build.lib.collect.nestedset.NestedSetBuilder;
 import com.google.devtools.build.lib.packages.Package;
+import java.util.ArrayList;
+import java.util.Comparator;
 import java.util.concurrent.ConcurrentHashMap;
 import javax.annotation.Nullable;
 
 /** Tuple storing state associated with transitive dependencies. */
 public final class TransitiveDependencyState {
+  private static final Comparator<Package> PACKAGE_COMPARATOR =
+      comparing(Package::getPackageIdentifier);
+  private static final Comparator<ConfiguredTargetValue> CONFIGURED_TARGET_VALUE_COMPARATOR =
+      comparing((ConfiguredTargetValue value) -> value.getConfiguredTarget().getLabel())
+          .thenComparing(value -> value.getConfiguredTarget().getConfigurationChecksum());
+
   private final NestedSetBuilder<Cause> transitiveRootCauses;
 
   /**
@@ -36,6 +46,7 @@ public final class TransitiveDependencyState {
    * com.google.devtools.build.lib.skyframe.SkyframeExecutor#shouldStoreTransitivePackagesInLoadingAndAnalysis}.
    */
   @Nullable private final NestedSetBuilder<Package> transitivePackages;
+  private final ArrayList<Object> depsToBeSorted = new ArrayList<>();
 
   /**
    * Contains packages that were previously computed.
@@ -75,6 +86,21 @@ public final class TransitiveDependencyState {
 
   @Nullable
   public NestedSetBuilder<Package> transitivePackages() {
+    if (transitivePackages == null) {
+      return null;
+    }
+    depsToBeSorted.stream()
+        .filter(dep -> dep instanceof Package)
+        .map(dep -> (Package) dep)
+        .sorted(PACKAGE_COMPARATOR)
+        .forEachOrdered(transitivePackages::add);
+    depsToBeSorted.stream()
+        .filter(dep -> dep instanceof ConfiguredTargetValue)
+        .map(dep -> (ConfiguredTargetValue) dep)
+        .sorted(CONFIGURED_TARGET_VALUE_COMPARATOR)
+        .map(ConfiguredTargetValue::getTransitivePackages)
+        .forEachOrdered(transitivePackages::addTransitive);
+    depsToBeSorted.clear();
     return transitivePackages;
   }
 
@@ -95,7 +121,7 @@ public final class TransitiveDependencyState {
     if (transitivePackages == null) {
       return;
     }
-    transitivePackages.addTransitive(configuredTarget.getTransitivePackages());
+    depsToBeSorted.add(configuredTarget);
   }
 
   /**
@@ -107,7 +133,7 @@ public final class TransitiveDependencyState {
     if (transitivePackages == null) {
       return;
     }
-    transitivePackages.add(pkg);
+    depsToBeSorted.add(pkg);
   }
 
   @Nullable

copybara-service bot pushed a commit that referenced this issue Jun 14, 2023
`describedNestedSetFingerprint` appended a `StringBuilder` to itself in a loop instead of the actual item from the nested set, resulting in OOMs in `Runfiles#describeKey` and `RepoMappingManifestAction#describeKey`.

Work towards #18666

Closes #18668.

PiperOrigin-RevId: 540270874
Change-Id: Id408ab4c2438bea264b586f8ae5567dc41260242
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Jun 14, 2023
`describedNestedSetFingerprint` appended a `StringBuilder` to itself in a loop instead of the actual item from the nested set, resulting in OOMs in `Runfiles#describeKey` and `RepoMappingManifestAction#describeKey`.

Work towards bazelbuild#18666

Closes bazelbuild#18668.

PiperOrigin-RevId: 540270874
Change-Id: Id408ab4c2438bea264b586f8ae5567dc41260242
@sdtwigg
Copy link
Contributor

sdtwigg commented Jun 16, 2023

From the perspective of trim_test_configuration issues, it is especially peculiar that the generated actions are different (presumably due to the stated non-determinism above). Usually, when this sort of issue is triggered, the action keys are actually the same (even if they are two unsharable cc-related actions).

Addressing the non-determinism should let action sharing work again and side-step the trim_test_configuration issue while also being better for build incrementality anyways.

@greatfilter
Copy link

Sorry for the late response. I don't check the email associated with this account frequently.

@lberki - yes, the results are streamed in as they arrive so the ordering is nondeterministic.

If sorting suffices here, @fmeum's approach looks good with some slight modifications. I'll draft a change.

copybara-service bot pushed a commit that referenced this issue Jun 19, 2023
PiperOrigin-RevId: 541656611
Change-Id: Ie6c365275003343dc2bf0520a305e005ec48550b
@fmeum
Copy link
Collaborator

fmeum commented Jun 21, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 21, 2023
@fmeum
Copy link
Collaborator

fmeum commented Jun 21, 2023

Fixed in bd9eb4f

@keith Could you test the fix?

@iancha1992
Copy link
Member

@bazel-io fork 6.3.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Jun 21, 2023
@keith
Copy link
Member Author

keith commented Jun 21, 2023

Hard to say 100% since it was pretty flaky, but it looks like in 20 runs in my repro case I could never get the failure! Thanks folks!

@keith keith closed this as completed Jun 21, 2023
@iancha1992
Copy link
Member

@lberki @fmeum
In order to cherry-pick bd9eb4f, looks like we'll need one or more commits pushed before we can do it. From the merge conflicts, we found that several files are missing in the release-6.3.0. We'd appreciate if you let us know what commits/PR's are needed to avoid these merge conflicts. Thanks

FYI, below is the list of the missing files:
src/main/java/com/google/devtools/build/lib/analysis/producers/BUILD
src/main/java/com/google/devtools/build/lib/analysis/producers/ConfigConditionsProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/ConfiguredAspectProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/ConfiguredTargetAndDataProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/DependencyContextProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/DependencyContextProducerWithCompatibilityCheck.java
src/main/java/com/google/devtools/build/lib/analysis/producers/PrerequisiteParameters.java
src/main/java/com/google/devtools/build/lib/analysis/producers/PrerequisitesProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/TargetAndConfigurationProducer.java
src/main/java/com/google/devtools/build/lib/analysis/producers/TargetProducer.java
src/main/java/com/google/devtools/build/lib/skyframe/PrerequisiteProducer.java

@fmeum
Copy link
Collaborator

fmeum commented Jun 22, 2023

@iancha1992 Looks like we can't cherry-pick this as the affected files don't even exist in 6.3.0.

@Wyverald
Copy link
Member

Thank you Ian for looking into this. Before we call off the cherry-picks, let's investigate whether it's possible to backport the prerequisite changes. Let's discuss this (and other related cherry-pick PRs) in #18419.

traversaro pushed a commit to traversaro/bazel that referenced this issue Jun 24, 2023
`describedNestedSetFingerprint` appended a `StringBuilder` to itself in a loop instead of the actual item from the nested set, resulting in OOMs in `Runfiles#describeKey` and `RepoMappingManifestAction#describeKey`.

Work towards bazelbuild#18666

Closes bazelbuild#18668.

PiperOrigin-RevId: 540270874
Change-Id: Id408ab4c2438bea264b586f8ae5567dc41260242
@Colecf
Copy link

Colecf commented Jun 28, 2023

We're also hitting this issue in android (b/288969037) even after bd9eb4f

@keith
Copy link
Member Author

keith commented Aug 23, 2023

@Colecf is your case fixed or should we re-open this one?

@Colecf
Copy link

Colecf commented Aug 23, 2023

Sorry, we were able to fix it by adding transitive_configs = ["//command_line_option/fragment:test"] to some of our macros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug untriaged
Projects
None yet
Development

No branches or pull requests