Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching strings makes switches in bytecode #8451

Merged
merged 1 commit into from Nov 19, 2019
Merged

Conversation

hrhino
Copy link
Member

@hrhino hrhino commented Oct 1, 2019

Switchable matches with string-typed scrutinee survive the pattern matcher in the same way as those on integer types do: as a series of CaseDefs with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that. No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of throw new MatchError.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740.

@scala-jenkins scala-jenkins added this to the 2.13.2 milestone Oct 1, 2019
@hrhino hrhino added the WIP label Oct 1, 2019
@hrhino
Copy link
Member Author

hrhino commented Oct 1, 2019

Darn, I missed the chance to use the shiny "WIP" feature.

@sjrd: the positions test is implicitly a scala.js-compatibility test. The trees coming out of patmat are hopefully not too difficult for the SJS backend to handle. Maybe this is a good time for me to get started over there.

@sjrd
Copy link
Member

sjrd commented Oct 1, 2019

Cool :) Thanks for the separation in cleanup.

Maybe this is a good time for me to get started over there.

Sure, why not? :) If you do, you'll want to work in the 0.6.x branch. We forward-merge 0.6.x into master every now and then.

@retronym
Copy link
Member

retronym commented Oct 4, 2019

The tests fail when the compiler has been bootstrapped through this change. A pattern match in ScalaVersion$.<init> now throws a MatchError.

Minimized:

import annotation.switch

object Test {
  def test(s: String): Int = {
    (s : @switch) match {
      case null | "" => 0
      case _ => s.toInt
    }
  }

  def main(args: Array[String]): Unit = {
    println(test("2"))
  }
}
% qscalac sandbox/test.scala && qscala Test
scala.MatchError: 2 (of class java.lang.String)
	at Test$.test(test.scala:5)
      case <synthetic> val x1: String = (s: String);
      {
        matchEnd1(if (x1.eq(null))
          0
        else
          x1.hashCode() match {
          case 0 => if ("".equals(x1))
            0
          else
            matchEnd2()
          case 3392903 => if ("null".equals(x1))
            0
          else
            matchEnd2()
          case _ => matchEnd2()
        });
        matchEnd2(){
          throw new MatchError(x1)
        };
        matchEnd1(x$1: Int){
          x$1
        }
      }
    };

@retronym
Copy link
Member

retronym commented Oct 4, 2019

There appear to be two separate problems.

First, if we want to make this work for null patterns, we need special care in Cleanup to avoid the fact that Constant(null).stringValue == "null" (rather than == null). We also need to avoid a NPE by using null eq value rather than null.equals(value).

The MatchError above does not rely on null, though, as

import annotation.switch

object Test {
  def test(s: String): Int = {
    (s : @switch) match {
      case "" => 0
      case _ => s.toInt
    }
  }

  def main(args: Array[String]): Unit = {
    println(test("2"))
  }
}

Also fails.

@retronym
Copy link
Member

retronym commented Oct 4, 2019

I push-f-ed fixes for these problems.

@hrhino
Copy link
Member Author

hrhino commented Oct 4, 2019

Thanks! I'm also not correct in my handling of Alternative... Right now it copies the body of the case for each pattern. I'll push a fix for that.

*/

val stats = mutable.ListBuffer.empty[Tree]
var failureBody = Throw(New(definitions.MatchErrorClass.tpe_*, REF(sel))) : Tree
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why I thought I couldn't use the synth default case from patmat. This should never be used (the Ident(WILDCARD, _, body) case should always be hit but it seems safer.

case cd@CaseDef(StringsPattern(strs), _, body) =>
val jump = currentOwner.newLabel(unit.freshTermName("case"), swPos).setInfo(MethodType(Nil, restpe))
stats += LabelDef(jump, Nil, succeed(body))
strs.map((_, jump, cd.pat.pos))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to copy the body of a case if there were alternatives, so

x match { case "a" | "b" => body }

would introduce tree sharing.

@@ -498,6 +595,9 @@ abstract class CleanUp extends Statics with Transform with ast.TreeDSL {
super.transform(localTyper.typedPos(tree.pos)(consed))
}

case switch: Match =>
super.transform(transformSwitch(switch))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks... I can't believe I forgot this

@hrhino hrhino requested a review from retronym October 4, 2019 11:31
@retronym
Copy link
Member

retronym commented Oct 7, 2019

Probably worth a test to show that the scrutinee is only evaluated once. Cleanup.transformSwitch relies on the fact that patmat stabilizes the scrutinee into temp val.

I've manually tested with the following.

object Test {

  def test(s: String, expected: Int): Unit = {
    val it = Iterator(s)
    val r = (it.next(): @annotation.switch) match {
        // collision
        case "AaAa" => 1
        case "BBBB" => 2
      
        // collision with "FB"
        case "Ea"   => 3

        case _      => 4
     }
     assert(expected == r, (expected, r))
  }

  def main(args: Array[String]): Unit = {
    test("AaAa", 1)
    test("BBBB", 2)
    test("Ea", 3)
    test("FB", 4)
    test("XXX", 4)
    test(null, 4)
  }
}

/* From this:
* string match { case "AaAa" => 1 case "BBBB" | "c" => 2 case _ => 3}
* Generate this:
* goto matchSuccess (string.## match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "goto matchSuccess" here? All cases of the match branch so it seems like a no-op to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I had it before I realized the problem with Alternatives

@retronym
Copy link
Member

retronym commented Oct 8, 2019

Comparing javac and scalac:

retronym/string-switch@c2820bb

class Test {
    int test(String s) {
        return switch(s) {
            case "AaAa" -> 1;
            case "BBBB" -> 2;
            case "Ea"   -> 3;
            case "xx"   -> 4;
            default     -> 5;
        };
    }
}
class Test {

  def test(s: String, expected: Int) = {
     s match {
        case "AaAa" => 1
        case "BBBB" => 2
        case "Ea"   => 3
        case "xx"   => 4
        case _      => 5
     }
  }
}

@hrhino
Copy link
Member Author

hrhino commented Oct 8, 2019

That's interesting. Seems like Java emits two switch instructions: one to get a number representing what code block to run, and one to run it. I'm not sure I see a benefit there compared to this.

@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Oct 9, 2019
@hrhino
Copy link
Member Author

hrhino commented Oct 9, 2019

I'm honored as always, @SethTisue

@martijnhoekstra
Copy link
Contributor

There is a comment in the javac source that they do it to keep fall through simpler, so that doesn't apply to match.

@hrhino
Copy link
Member Author

hrhino commented Oct 10, 2019

Nice find; thanks!

@retronym
Copy link
Member

retronym commented Oct 11, 2019

Another advantage the double-lookup would be the opportunity to use an invokedynamic based implementation like is being added to JDK 14 (https://github.com/openjdk/amber/blob/pattern-runtime/src/java.base/share/classes/java/lang/runtime/PatternHandles.java). That's not something we're targetting right now as we obviously can't depend on JDK14, but something for later on.

@retronym
Copy link
Member

I guess the cleanup phase is de-facto part of the JVM backend as ScalaJS doesn't run it, but it does seem a little strange (but not a blocker for the change) to desugar match-on-int and match-on-string in different phases.

@retronym retronym requested a review from lrytz October 11, 2019 04:36
@sjrd
Copy link
Member

sjrd commented Oct 11, 2019

it does seem a little strange (but not a blocker for the change) to desugar match-on-int and match-on-string in different phases.

The way I see it, match-on-string is desugared into match-on-int, but the latter is compiled as is to a JVM instruction. Match-on-int is never really desugared.

Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

test/files/jvm/string-switch/Test.scala Outdated Show resolved Hide resolved
@sjrd
Copy link
Member

sjrd commented Oct 11, 2019

@hrhino Are you interested in trying to make the required changes in the Scala.js compiler backend, or should I do them?

@hrhino
Copy link
Member Author

hrhino commented Oct 11, 2019

@sjrd I'll give it a go.

Switchable matches with string-typed scrutinee survive the pattern
matcher in the same way as those on integer types do: as a series of
`CaseDef`s with empty guard and literal pattern.

Cleanup collates them by hash code and emits a switch on that.
No sooner, so scala.js can emit a more JS-friendly implementation.

Labels were used to avoid a proliferation of `throw new MatchError`.

Works with nulls. Works with Unit.

Enclosed "pos" test stands for positions, not positivity.

Fixes scala/bug#11740

Co-Authored-By: "Jason Zaugg" <jzaugg@gmail.com>
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! @sjrd should we wait with the merge until Scala.js is ready?

@sjrd
Copy link
Member

sjrd commented Oct 14, 2019

should we wait with the merge until Scala.js is ready?

Yes please, otherwise the community build will break. @hrhino told me he intended to do it, but he needs some approval.

@SethTisue SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Oct 14, 2019
@sjrd
Copy link
Member

sjrd commented Nov 11, 2019

So ... it turns out that in Scala.js 0.6.x, this actually just works out of the box 😆 The thing is that in 0.6.x, our IR supports any kind of literal in our Match nodes (equivalent to JVM switches), and our entire pipeline is ready to accept constant strings and nulls in there, and emit them as is in JS, which is valid JS. Even though none of it was ever exercised nor tested before, it just worked 😎

Now, ironically, Scala.js 1.x crashes and burns with this change, because in 1.x we restricted our Match nodes to support only Ints (like the JVM). So I'll have to work on that.

sjrd added a commit to sjrd/scala-js that referenced this pull request Nov 11, 2019
Since Scala 2.13.2, the pattern matcher will keep `Match` nodes
that match on `String`s and `null`s as is, to be desugared later
in `cleanup`. This was implemented upstream in
scala/scala#8451

We implement a more general translation that will accept any kind
of scalac `Literal`. If the scrutinee is an integer and all cases
are int literals, we emit a `js.Match` as before. Otherwise, we
emit an `if..else` chain with `===` comparisons.
@sjrd
Copy link
Member

sjrd commented Nov 11, 2019

scala-js/scala-js#3842

@sjrd
Copy link
Member

sjrd commented Nov 13, 2019

OK we've got a PR for Scala.js with green CI and review, ready to be merged. So feel free to merge this PR in Scala, and we'll follow suit in Scala.js.

@lrytz lrytz merged commit 48f73bc into scala:2.13.x Nov 19, 2019
@SethTisue SethTisue changed the title Matching strings makes switches Matching strings makes switches in bytecode Feb 11, 2020
harpocrates added a commit to harpocrates/dotty that referenced this pull request Mar 29, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this pull request Mar 29, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this pull request Aug 18, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.
harpocrates added a commit to harpocrates/dotty that referenced this pull request Aug 18, 2021
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.

Fixes scala#11923
olsdavis pushed a commit to olsdavis/dotty that referenced this pull request Apr 4, 2022
The pattern matcher will now emit `Match` with `String` scrutinee as
well as the existing `Int` scrutinee. The JVM backend handles this case
by emitting bytecode that switches on the String's `hashCode` (this
matches what Java does). The SJS already handles `String` matches.

The approach is similar to scala/scala#8451 (see scala/bug#11740 too),
except that instead of doing a transformation on the AST, we just emit the
right bytecode straight away. This is desirable since it means that
Scala.js (and any other backend) can choose their own optimised strategy
for compiling a match on strings.

Fixes scala#11923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance the need for speed. usually compiler performance, sometimes runtime performance. release-notes worth highlighting in next release notes
Projects
None yet
7 participants