Rework exclude rule merging #9307

melix · 2019-04-27T20:15:54Z

Context

As a follow-up to #9197, this commit properly fixes the
exclude rule merging algorithm, by completely rewriting
it. The new merging algorithm works by implementing the
minimal set of algebra operations that make sense to
minimize computation durations. In order to do this,
this commit introduces a number of exclude specs
(found in their own package) and factories to create
actual implementation of those specs.

Specs represent the different kind of excludes we can
find:

excluding a group
excluding a module (no group defined)
excluding a group+module
excluding an artifact of a group+module
pattern-matching excludes
unions of excludes
intersections of excludes

With all those minimal bricks, factories are responsible
of generating consistent specs. The dumbest factory
will just generate new instances for everything. This
is the default factory.

Minimally, this factory has to be backed by an optimizing
factory, which will take care of handling special cases:

union or intersection of a single spec
union or intersection of 2 specs
- when one of them is null
- when both are equal

Then we have a factory which performs the minimal algebra
to minimize specs:

unions of unions
intersections of intersections
union of a union and individual specs
insection of an intersection and individual spec
...

This factory can be as smart as it can, but one must be
careful that it's worth it: some previously implemented
optimizations (like (A+B).A = A turned out to be costly
to detect, and didn't make it the final cut.

Yet another factory is there to reduce the memory footprint
and, as a side effect, make things faster by interning
the specs: equivalent specs are interned and indexed, which
allows us to optimize unions and intersections of specs.

Last but not least, a caching factory is there to avoid
recomputing the same intersections and unions of specs
when we have already done the job. This is efficient if
the underlying (delegate) specs are easily compared,
which is the case thanks to the interning factory.

All in all, the delegation chain allows us to make
the algorithm fast and hopefully reliable, while
making it easier to debug.

Hopefully this should fix the inconsistent classpath ordering
issues some customers have reported.

ljacomet

Looks good overall!

A couple comments and still need to take a deeper look at the indexed / interned part.

ljacomet · 2019-04-28T09:58:54Z

...nternal/artifacts/ivyservice/resolveengine/excludes/factories/NormalizingExcludeFactory.java

+                }
+                simpleExcludes.add((ModuleIdExclude) spec);
+            }
+            flattened.add(spec);


Pretty sure that's a no-op, and if it was not you would get a ConcurrentModificationException

ljacomet · 2019-04-28T10:00:45Z

...nternal/artifacts/ivyservice/resolveengine/excludes/factories/NormalizingExcludeFactory.java

+                flattened.removeAll(simpleExcludes);
+                flattened.add(delegate.moduleSet(simpleExcludes.stream().map(ModuleIdExclude::getModuleId).collect(Collectors.toSet())));
+            } else {
+                flattened.add(simpleExcludes.iterator().next());


AFAICT that spec is already in flattened so this line is a no-op.

ljacomet · 2019-04-28T10:20:38Z

...l/artifacts/ivyservice/resolveengine/excludes/factories/NormalizingExcludeFactoryTest.groovy

+
+        where:
+        left                               | right                | expected
+/*        everything()                       | nothing()            | everything()


Forgot to revert the commenting?

ljacomet

Minor review comments

ljacomet · 2019-04-30T08:34:26Z

buildSrc/subprojects/packaging/src/main/kotlin/org/gradle/gradlebuild/packaging/MinifyPlugin.kt

@@ -38,6 +38,7 @@ open class MinifyPlugin : Plugin<Project> {
        val keepPatterns = mapOf(
            "fastutil" to setOf(
                "it.unimi.dsi.fastutil.ints.IntOpenHashSet",
+                "it.unimi.dsi.fastutil.ints.IntRBTreeSet",


This change is no longer needed AFAICT

Indeed, will remove.

ljacomet · 2019-04-30T08:52:54Z

...nternal/artifacts/ivyservice/resolveengine/excludes/factories/NormalizingExcludeFactory.java

+        List<ModuleSetExclude> moduleSetExcludes = UnionOf.MODULE_SET.fromMap(byType);
+        List<ExcludeSpec> other = UnionOf.NOT_JOINABLE.fromMap(byType);
+        if (!moduleIdExcludes.isEmpty()) {
+            if (moduleIdExcludes.size() > 1 || !moduleIdSetsExcludes.isEmpty()) {


A small comment to express intent might help there, or extract to a method.

// More than one ModuleIdExclude -> make it a ModuleIdSetExclude

Also, with the flattening done below, there might be a way to optimize when the moduleIdSetsExcludes.isEmpty() is true as you now create an immutable list that will be flattened again.

This code path is not called often and doesn't show up as a hotspot, so I didn't bother optimizing yet.

ljacomet · 2019-05-03T09:47:45Z

...integTest/groovy/org/gradle/integtests/resolve/alignment/ForcingPlatformAlignmentTest.groovy

@@ -992,7 +992,8 @@ include 'other'

    @RequiredFeatures([
            @RequiredFeature(feature = GradleMetadataResolveRunner.REPOSITORY_TYPE, value = "maven"),
-            @RequiredFeature(feature = GradleMetadataResolveRunner.EXPERIMENTAL_RESOLVE_BEHAVIOR, value = "true")
+            @RequiredFeature(feature = GradleMetadataResolveRunner.EXPERIMENTAL_RESOLVE_BEHAVIOR, value = "true"),


The changes in this test look like leftovers from trying things locally and should be reverted AFAICT

As a follow-up to #9197, this commit properly fixes the exclude rule merging algorithm, by completely rewriting it. The new merging algorithm works by implementing the minimal set of algebra operations that make sense to minimize computation durations. In order to do this, this commit introduces a number of exclude specs (found in their own package) and factories to create actual implementation of those specs. Specs represent the different kind of excludes we can find: - excluding a group - excluding a module (no group defined) - excluding a group+module - excluding an artifact of a group+module - pattern-matching excludes - unions of excludes - intersections of excludes With all those minimal bricks, factories are responsible of generating consistent specs. The dumbest factory will just generate new instances for everything. This is the default factory. Minimally, this factory has to be backed by an optimizing factory, which will take care of handling special cases: - union or intersection of a single spec - union or intersection of 2 specs - when one of them is null - when both are equal Then we have a factory which performs the minimal algebra to minimize specs: - unions of unions - intersections of intersections - union of a union and individual specs - insection of an intersection and individual spec - ... This factory can be as smart as it can, but one must be careful that it's worth it: some previously implemented optimizations (like (A+B).A = A turned out to be costly to detect, and didn't make it the final cut. Last but not least, a caching factory is there to avoid recomputing the same intersections and unions of specs when we have already done the job. This is efficient if the underlying (delegate) specs are easily compared, which is the case thanks to the interning factory. All in all, the delegation chain allows us to make the algorithm fast and hopefully reliable, while making it easier to debug.

It didn't prove as fast as it was intended to be. Instead, we performed the same optimization for single groups/modules as we did for module sets.

The merge cache can be reused at different stages. In particular, when the normalizing factory normalizes unions/intersections, it is possible that the normalized union is found in cache.

This commit optimizes flattening by avoiding the creation of intermediate data structures. In particular using lists we were converting from and to sets unnecessarily.

Measurements show it's significantly faster than using a read/write lock.

When sorting selectors, we will only give priority to a "latest" selector if it's a require selector.

Sorting can be expensive, especially when the list is almost always sorted. This commit changes the implementation to use on the fly sorting, as we add selectors.

This involves the creation of an array, which starts becoming relevant when we create lots of those objects.

While it's better to use ImmutableList for consistency, it is surprisingly expensive to have the answer to the question `isEmpty` on an empty immutable list. This is done only where it showed up as a hotspot in profiling: getting artifacts.

Deduplication is done when adding a reason. The rationale is that in most cases, we have a single reason, and it's extremely unlikely to have a large set of reasons.

This is an attempt to optimize performance: instead of iterating over all dependencies, findind constraints, then checking if they belong to the list of newly activated constraints, which can be expensive in case of dependency locking, we now reverse the lookup and for activated constraints, get the list of dependency states which are effectively affected.

This should reduce the amount of garbage created during selection of artifact transforms.

Serializing attributes (and reading them) can be quite expensive. In project like Android projects, there are many attributes serialized and they often have the same values. This prevents overhead by keeping the serialized values de-duplicated.

Desugaring can be expensive, and this commit reduces the cost by making sure we don't desugar the same instance of attributes twice. It makes use of the fact we have an immutable attributes factory which returns always the same instance of attributes.

Component attribute matching is the most expensive part of attribute selection whenever there's an ambiguity. In Android, that's very often the case. This commit changes the caching code to use more than one (actually cache all) query cache in order to try to reduce selection duration.

Instead of checking for **every** file entry if the file is new, and then copy it if it's missing, we only need to do this once, at the top-level exploded directory: this saves a lot of "file.exist" calls, which can be quite expensive.

This fixes the Gradleception build.

melix added in:dependency-management DO NOT USE from:member labels Apr 27, 2019

melix added this to the 5.5 RC1 milestone Apr 27, 2019

melix requested review from ljacomet and bigdaz April 27, 2019 20:15

melix self-assigned this Apr 27, 2019

melix force-pushed the cc/dm/rework-exclude-merging branch 2 times, most recently from 0598b6c to 9b3d1c7 Compare April 27, 2019 22:21

melix removed request for ljacomet and bigdaz April 27, 2019 22:24

melix changed the title ~~Rework exclude rule merging~~ WIP - Rework exclude rule merging Apr 27, 2019

melix force-pushed the cc/dm/rework-exclude-merging branch from 9b3d1c7 to f0d843a Compare April 28, 2019 07:00

ljacomet reviewed Apr 28, 2019

View reviewed changes

melix force-pushed the cc/dm/rework-exclude-merging branch 6 times, most recently from 8b8a96a to 0d78e67 Compare April 29, 2019 13:47

ljacomet reviewed Apr 30, 2019

View reviewed changes

melix force-pushed the cc/dm/rework-exclude-merging branch 2 times, most recently from 533e90d to ec085cf Compare May 2, 2019 19:48

ljacomet reviewed May 3, 2019

View reviewed changes

eskatos and others added 7 commits May 3, 2019 14:31

Remove indexed exclude factory

2b35007

It didn't prove as fast as it was intended to be. Instead, we performed the same optimization for single groups/modules as we did for module sets.

Optimize ExcludePair for comparisons

b33a6eb

Reuse caching at lower level

67e1e79

The merge cache can be reused at different stages. In particular, when the normalizing factory normalizes unions/intersections, it is possible that the normalized union is found in cache.

Optimize flattening

0d168e6

This commit optimizes flattening by avoiding the creation of intermediate data structures. In particular using lists we were converting from and to sets unnecessarily.

Use coarce grained locking

5e59d33

Measurements show it's significantly faster than using a read/write lock.

Remove redundant optimization

ee367ab

melix added 4 commits May 6, 2019 10:01

Simplify VersionParser

84a6b68

Only consider "required" latest selector

b0ca0a6

When sorting selectors, we will only give priority to a "latest" selector if it's a require selector.

Make selector sorting faster

e231c77

Order selectors on the fly

dbb2ca6

Sorting can be expensive, especially when the list is almost always sorted. This commit changes the implementation to use on the fly sorting, as we add selectors.

melix force-pushed the cc/dm/rework-exclude-merging branch from 233d744 to dbb2ca6 Compare May 6, 2019 10:24

melix added 7 commits May 6, 2019 14:23

Avoid the use of Objects.hashCode

5fcbee8

This involves the creation of an array, which starts becoming relevant when we create lots of those objects.

Use emptyList instead of ImmutableList.of()

b4d8d0c

While it's better to use ImmutableList for consistency, it is surprisingly expensive to have the answer to the question `isEmpty` on an empty immutable list. This is done only where it showed up as a hotspot in profiling: getting artifacts.

Use a list instead of a set for dependency reasons

f702011

Deduplication is done when adding a reason. The rationale is that in most cases, we have a single reason, and it's extremely unlikely to have a large set of reasons.

Fix NPE: version of capability may be null

fbf3905

Fix duplicate rule descriptors

0f5a865

Use cached dependency states for additional constraints

1cc7ed0

melix force-pushed the cc/dm/rework-exclude-merging branch from 5030414 to be092aa Compare May 8, 2019 14:21

Fix constraint not being reactivated

59d2b1f

melix force-pushed the cc/dm/rework-exclude-merging branch from cc97554 to 59d2b1f Compare May 9, 2019 16:24

Optimize variant transform selection

a54fceb

This should reduce the amount of garbage created during selection of artifact transforms.

melix force-pushed the cc/dm/rework-exclude-merging branch 3 times, most recently from 14185bc to a54fceb Compare May 13, 2019 05:38

melix added 4 commits May 14, 2019 10:20

Cache desugaring of attributes

5a1ba04

Desugaring can be expensive, and this commit reduces the cost by making sure we don't desugar the same instance of attributes twice. It makes use of the fact we have an immutable attributes factory which returns always the same instance of attributes.

Optimize zip/tar visit

867de47

Instead of checking for **every** file entry if the file is new, and then copy it if it's missing, we only need to do this once, at the top-level exploded directory: this saves a lot of "file.exist" calls, which can be quite expensive.

melix force-pushed the cc/dm/rework-exclude-merging branch from f2cf15d to 8c227ac Compare May 14, 2019 21:08

Always add constraints to potentially activating list

a6577d6

This fixes the Gradleception build.

melix force-pushed the cc/dm/rework-exclude-merging branch from 8c227ac to a6577d6 Compare May 14, 2019 21:19

melix changed the title ~~WIP - Rework exclude rule merging~~ Rework exclude rule merging May 15, 2019

melix merged commit 9e8afca into master May 15, 2019

melix deleted the cc/dm/rework-exclude-merging branch May 15, 2019 06:44

jjohannes mentioned this pull request Oct 9, 2019

Fix discovery of components without metadata in repositories #10996

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework exclude rule merging #9307

Rework exclude rule merging #9307

melix commented Apr 27, 2019

ljacomet left a comment

ljacomet Apr 28, 2019

ljacomet Apr 28, 2019

ljacomet Apr 28, 2019

ljacomet left a comment

ljacomet Apr 30, 2019

melix Apr 30, 2019

ljacomet Apr 30, 2019

melix Apr 30, 2019

ljacomet May 3, 2019

Rework exclude rule merging #9307

Rework exclude rule merging #9307

Conversation

melix commented Apr 27, 2019

Context

ljacomet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ljacomet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment