Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert #14452 and make compile-time operations on stable arguments stable #15268

Merged
merged 5 commits into from
Jun 15, 2022

Conversation

mbovel
Copy link
Member

@mbovel mbovel commented May 23, 2022

Revert #14452

#14452 made compile-time operations covariant. The use-case was mainly to allow allow the compile to remove skolems around operation arguments. That for example enables the following:

(?1 : x.type * x.type) * x.type <: (x.type * x.type) * x.type

Unfortunately, we realized afterwards that this adds significant constraints on the normalization of these compile-time operations. To preserve transitivity of algorithmic subtyping, we need to make sure that ∀ T, U. T <: U → Norm[T] <: Norm[U], which is non-trivial to achieve when compile-time operations are covariant.

For instance, consider the two following situations (an arrow from A to B means A <: B):

image

In both cases, the problem is the red arrow:

  • On the left: if we normalize 1 + 1 to 2, then we should make sure that it is still a subtype of 1 + (2 | 3). This could be achieved by distributing additions and multiplications over unions, but we might not want to do it because this would make the runtime/space of normalization exponential.
  • On the right, we see a more general case: if 1 + (a : A) is normalized to (a: A) + 1, how do we ensure that the later is also a subtype of 1 + A? We thought about a few possible ways to do that—for example by approximating non-singleton types by their known atoms—, but it seems that if we want to handle the general case, we can't avoid brutally trying different orders of operands to compare operations, which is also exponential in the worst case.

Conclusion: making compile-time operations covariant complexifies the implementation of normalization because it creates new subtype edges that we would need to preserve when normalizing. We do not want to handle all these cases now, we revert #14452 to make them invariant again (reverted in 12a10f2).

Make compile-time operations on stable arguments stable

So we don't want compile-time operations to be covariant, but we still need:

(?1 : x.type * x.type) * x.type <: (x.type * x.type) * x.type

In order to achieve this:

  1. 16fa957 extends Type.isStable so that compile-time operations with stable arguments are considered stable,
  2. 7c8667e extends the fixForEvaluation method inside AppliedType.tryCompiletimeConstantFold so that arguments of compile-time operations are dereferenced to their deepest underlying stable type.

See tests/neg/singleton-ops-int.scala for the new tests it enables to pass.

Add micro benchmarks

To check the performance of common operations on types, 823d710 adds a new JMH suite that benchmarks the runtime of the common type operations isStable, normalized, simplified, dealias, widen, atoms, isProvisional on some simple types (AppliedType, TermRef, etc.).

The suite can be run with:

sbt "scala3-bench-micro / jmh:run"

The use-case of these benchmarks for this PR was mainly to test different caching strategies for Type#isStable and to make sure that they indeed result in a performance gain. See results in comments for more details.

@mbovel mbovel added this to the 3.2.0-RC1 milestone May 23, 2022
@mbovel mbovel marked this pull request as ready for review May 23, 2022 13:58
@mbovel mbovel requested review from odersky and soronpo May 23, 2022 13:58
@@ -165,6 +165,8 @@ object Types {
case _: SingletonType | NoPrefix => true
case tp: RefinedOrRecType => tp.parent.isStable
case tp: ExprType => tp.resultType.isStable
case AppliedType(tycon: TypeRef, args: List[Type])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nervous about performance implications here. isStable is a hotspot and what you do here seems to be about an order of magnitude more expensive than the rest. If we want to go down that route we probably should cache the value in AppliedType.

Copy link
Member Author

@mbovel mbovel May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b6704d7 caches isStable in AppliedType. As isStable does not depend on type variables, can we just cache it once and for all? Or should it still depend on the period? All current tests pass with the first alternative.

In term of performance, I couldn't see any difference with our current benchmarks, so I added a micro benchmark measuring specifically the performance of isStable: 9ca8891.

Before any change (main):

Benchmark                         (valName)   Mode  Cnt         Score        Error  Units
IsStable.isStableBenchmark              int  thrpt   30  31573088.895 ± 446326.483  ops/s
IsStable.isStableBenchmark  deepSumUnstable  thrpt   30  31905715.432 ± 403879.002  ops/s
IsStable.isStableBenchmark    deepSumStable  thrpt   30  31632042.261 ± 329863.487  ops/s

With applied-types-aware isStable but without cache (9ca8891.):

Benchmark                         (valName)   Mode  Cnt         Score        Error  Units
IsStable.isStableBenchmark              int  thrpt   30  27618898.774 ± 351504.087  ops/s
IsStable.isStableBenchmark  deepSumUnstable  thrpt   30   2702127.175 ±  99159.508  ops/s
IsStable.isStableBenchmark    deepSumStable  thrpt   30    815868.091 ±  27206.935  ops/s

With applied-types-aware isStable cached (b6704d7):

Benchmark                         (valName)   Mode  Cnt         Score        Error  Units
IsStable.isStableBenchmark              int  thrpt   30  31601394.759 ± 228033.535  ops/s
IsStable.isStableBenchmark  deepSumUnstable  thrpt   30  31393180.896 ± 336974.793  ops/s
IsStable.isStableBenchmark    deepSumStable  thrpt   30  31814767.854 ± 531967.869  ops/s

This confirms that the performance of isStable on a deep type applications (balanced tree of depth 4) was indeed 1 or 2 order of magnitude slower without cache. By the way, don't we have exactly the same problems with intersections and unions? Why is it particularly a problem with applied types?

@mbovel

This comment was marked as outdated.

@dottybot

This comment was marked as outdated.

@dottybot

This comment was marked as outdated.

@scala scala deleted a comment from dottybot May 23, 2022
@mbovel

This comment was marked as outdated.

@dottybot

This comment was marked as outdated.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15268/ to see the changes.

Benchmarks is based on merging with main (6783853)

Copy link
Contributor

@soronpo soronpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dottybot
Copy link
Member

performance test scheduled: 6 job(s) in queue, 1 running.

1 similar comment
@dottybot
Copy link
Member

performance test scheduled: 6 job(s) in queue, 1 running.

@mbovel
Copy link
Member Author

mbovel commented May 31, 2022

Note that an important consequence of making compile-time operations on stable arguments stable is that it enables to use operation types where singletons are expected as demonstrated by this test:

summon[t85.type + t86.type <:< Singleton]

For now, this however does not work:

summon[t85.type + t86.type <:< Singleton & Int]

I plan to fix it in a separate PR.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15268/ to see the changes.

Benchmarks is based on merging with main (78dbc59)

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
@mbovel mbovel force-pushed the mb/revert-compiletime-ops-covariance branch 2 times, most recently from 80d5136 to b762b35 Compare June 7, 2022 19:46
@mbovel

This comment was marked as outdated.

@mbovel
Copy link
Member Author

mbovel commented Jun 7, 2022

test performance please

@dottybot
Copy link
Member

dottybot commented Jun 7, 2022

performance test scheduled: 1 job(s) in queue, 1 running.

@dottybot
Copy link
Member

dottybot commented Jun 7, 2022

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15268/ to see the changes.

Benchmarks is based on merging with main (76a0b29)

@mbovel
Copy link
Member Author

mbovel commented Jun 8, 2022

The previous run of the compilation benchmarks shows no significant differences.

Following are some more detailed micro benchmarks results.

isStable

  • In green, we can see as expected that extending isStable to consider compile-time operations on stable arguments stable without caching makes its runtime complexity linear: we get a ~3x slowdown for singletonsSum and intsSum, and 50x and 14x slowdowns for their deep counterparts.
  • In purple, we see that caching in AppliedType solves the problem.
  • In yellow, we see that caching in Type not only solves the problem, but drastically improves the runtime for all tested types. The speedup is 27x to 109x.

isStable

Each point is the throughput for a JMH iteration, relative to the median of the corresponding "Baseline" iterations. Note the logarithmic scale for the y-axis. Higher is better.

normalized

  • In red, green and purple, we see that checking isStable in tryCompiletimeConstantFold significantly slows it down. Caching AppliedType#isStable only does not solve the problem.
  • However, in yellow, we see that caching isStable for all types solves the problem.

normalized

See legends in first graph. Linear scales. Higher is better.

simplified

Similar observations as for normalized.

simplified

See legends in first graph. Linear scales. Higher is better.

@mbovel mbovel force-pushed the mb/revert-compiletime-ops-covariance branch from b762b35 to 8b08215 Compare June 8, 2022 08:59
@@ -4242,7 +4253,7 @@ object Types {
// final val one = 1
// type Two = one.type + one.type
// ```
case tp: TermRef => tp.underlying
case tp: TypeProxy if tp.underlying.isStable => tp.underlying.fixForEvaluation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasoning here: no need to check that tp is stable because it will be if its underlying type is stable. Could someone confirm if this is correct?

Otherwise, we might want:

case tp: TypeProxy if tp.isStable && tp.underlying.isStable => tp.underlying.fixForEvaluation

or

case tp: SingletonType if tp.isStable && tp.underlying.isStable => tp.underlying.fixForEvaluation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least morally this is correct.

@mbovel mbovel requested a review from smarter June 9, 2022 12:59
@mbovel mbovel force-pushed the mb/revert-compiletime-ops-covariance branch 2 times, most recently from d3a71ed to 9bfc2f5 Compare June 13, 2022 10:51
@mbovel
Copy link
Member Author

mbovel commented Jun 13, 2022

After caching TermRef#isStable and noticing that it doesn't impact the benchmark results, I digged further and noticed that the time difference is spent in pattern matching in isStable. Here are some results showing the difference between pattern patching vs methods overriding for 3 different JVMs:

image

The y-axis unit is operations per second. Note the logarithmic scale. Higher is better.

I am not sure if these results are generally relevant or are only artifacts of these micro benchmarks.

@mbovel
Copy link
Member Author

mbovel commented Jun 13, 2022

test performance with #compiletime please

@dottybot
Copy link
Member

performance test scheduled: 3 job(s) in queue, 1 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/15268/ to see the changes.

Benchmarks is based on merging with main (836ed97)

@smarter
Copy link
Member

smarter commented Jun 13, 2022

I think you mean overriding not overloading, also it's not clear from the graph if bigger numbers are better or worse, but usually overriding is faster

@mbovel
Copy link
Member Author

mbovel commented Jun 13, 2022

I think you mean overriding not overloading

Indeed, fixed.

also it's not clear from the graph if bigger numbers are better or worse, but usually overriding is faster

Legend added. The unit of the y-axis is ops/sec; higher is better.

@mbovel
Copy link
Member Author

mbovel commented Jun 13, 2022

As discussed during today's dotty meeting:

@mbovel mbovel force-pushed the mb/revert-compiletime-ops-covariance branch from 9bfc2f5 to cba629b Compare June 13, 2022 15:00
@mbovel mbovel force-pushed the mb/revert-compiletime-ops-covariance branch from cba629b to 9205cc6 Compare June 14, 2022 12:02
@mbovel mbovel requested a review from odersky June 14, 2022 12:09
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -4242,7 +4253,7 @@ object Types {
// final val one = 1
// type Two = one.type + one.type
// ```
case tp: TermRef => tp.underlying
case tp: TypeProxy if tp.underlying.isStable => tp.underlying.fixForEvaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least morally this is correct.

@mbovel mbovel merged commit 80a1e95 into scala:main Jun 15, 2022
@mbovel mbovel deleted the mb/revert-compiletime-ops-covariance branch June 15, 2022 06:42
@mbovel
Copy link
Member Author

mbovel commented Jun 27, 2022

For now, this however does not work:

summon[t85.type + t86.type <:< Singleton & Int]

It actually does work; I just didn't realize that & had a higher operation priority than <:< 😅

summon[t85.type + t86.type <:< (Singleton & Int)] // compiles

@Kordyjan Kordyjan modified the milestones: 3.2.0-RC1, 3.2.0 Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants