Unicode escapes are ordinary escape sequences #8480

martijnhoekstra · 2020-03-09T10:31:06Z

s/replaceAllLiterally/replace because it's deprecated in scala2

The implementation is a bit simpler overall because it no longer needs to allow unicode escapes in triple quoted strings and raw escapes that scala2 needed to be grandfathered in for a deprecation cycle.

dottybot

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Have an awesome day! ☀️

martijnhoekstra · 2020-03-09T10:32:54Z

compiler/src/dotty/tools/dotc/transform/localopt/StringInterpolatorOpt.scala

              }
              Some(escapedStrs, elems)
            } catch {
-              case _: StringContext.InvalidEscapeException => None
+              case iee: StringContext.InvalidEscapeException => {
+                ctx.error(iee.getMessage() + "\n", stringPosition) //should be positioned at str.po


Before, this didn't emit an error at compile time, only at runtime.

I could use a hand in positioning this properly.

I could use a hand in positioning this properly.

Are you looking for tree.sourcePos or something more specific ?

I'd like to narrow it to a position of the actual error within the sourcePos.

Also, the transcript looks off (with the position too far "left") here, but I'm not sure anymore whether it just looks wrong or is wrong.

This proved a challenge, position of the first part of the interpolation was off, and in test output the second line of error output is indented differently than the first, giving the impression the caret is in a different position than it's really in, but I think I got it.

martijnhoekstra · 2020-03-09T10:34:31Z

compiler/src/dotty/tools/dotc/transform/localopt/StringInterpolatorOpt.scala

+                ctx.error(iee.getMessage() + "\n", stringPosition) //should be positioned at str.po
+                None
+              }
+              case iuee: IllegalArgumentException => {


StringContext.InvalidUnicodeEscapeException has access to the positioning stuff, but it's not publicly accessible for bincompat reasons. What's the best course of action?

What's the best course of action?

Good question, this is tricky. I guess we could hack the Scala2Unpickler to disregard the protected[scala] on this definition but that's pretty ugly.

Workarounds with java reflection, going through Java and extending StringContext$ (or is it final?), parsing the error message (the index is indicated in there) or not refining the position are also possible.

parsing the error message

I suggest doing that for now if it's not too much trouble, and we should leave a TODO for whenever we get to break forward compatibility of scala-library.

som-snytt · 2020-03-09T16:27:36Z

Not sure offhand how this change interacts with https://github.com/lampepfl/dotty/pull/8282/files which includes handling \u000a in char literal.

martijnhoekstra · 2020-03-09T16:40:20Z

From the looks of it, it might not interact, or it might become redundant. I'll double-check if that's correct (and which it is). EDIT: I should to look better.

martijnhoekstra · 2020-03-09T21:19:35Z

@som-snytt merging that patch into this branch keeps the test passing. If I understand that patch correctly primarily forbids \r, \n and FF char literals, and the test tests some other situations where those, or their escape sequences are accepted or not, right?

A strict reading of the scala2 spec allows only printable characters in char and string literals. That rules out tab. I suspect that's not intended. What the intent is exactly I don't know.

I never understood why we shouldn't have form feed char literals.

compiler/src/dotty/tools/dotc/parsing/Scanners.scala

som-snytt · 2020-03-09T21:39:24Z

One purpose of the linked PR is to disallow the control char between single quotes, as shown in the test; not to disallow the escape '\n'; for example,

val c = '
'

but to allow the unicode escape in that position, which historically was translated early to the control char. I guess getLitChar will just do the right thing now.

odersky · 2020-03-31T13:42:40Z

CI fails because of missing credentials. @anatoliykmetyuk can you take a look?

project/Build.scala

martijnhoekstra · 2020-04-23T13:01:14Z

rebased with 2.13.2 lib

compiler/src/dotty/tools/dotc/transform/localopt/StringInterpolatorOpt.scala

smarter

LGTM, thanks a lot!

som-snytt · 2020-04-24T02:29:13Z

tests/run/literals.scala

+      } catch {
+        case exception: Throwable => Some(s" raised exception $exception")
+      }
+    for (e <- res) println(s"test $name $e")


I was following up that the unicodes on line 11 were edited out, when I noticed that this (edit: partially) reverts my last change.

The test has to assert because there is no longer a check file -- vulpix only checks the output if there is a check file.

I'll see if I can restore those changes. Does vulpix detect a test failing an assertion? I guess I'll jiggle with it a bit and let you know.

As for the identifier on line 11, the variant that is still supported is in tests/run/unicodeEscapes.scala line 13-14, which makes good on the promise of temporarily.

NBD, I PR'd it, thanks. I was about to futz with more parsing. It's amazing how quickly I forgot like how do I even run a test?

Scala 3 changes compared to the existing Scala 2 spec: - Reusing alphaid in the definition of plainid (this does not change its meaning) - Addition of quoteId and spliceId - Correctly specifying the use of _ in numeric literals. - Dropping symbolLiteral Scala 2 changes compared to the existing Scala 3 spec: - Various refactorings - Specifying the new Unicode escape handling stuff, this was already implemented in Scala 3 but not part of syntax.md (see scala#8480).

dottybot reviewed Mar 9, 2020

View reviewed changes

martijnhoekstra commented Mar 9, 2020

View reviewed changes

som-snytt reviewed Mar 9, 2020

View reviewed changes

compiler/src/dotty/tools/dotc/parsing/Scanners.scala Outdated Show resolved Hide resolved

som-snytt reviewed Mar 9, 2020

View reviewed changes

compiler/src/dotty/tools/dotc/parsing/Scanners.scala Outdated Show resolved Hide resolved

OlivierBlanvillain assigned martijnhoekstra Mar 26, 2020

martijnhoekstra force-pushed the uni branch 3 times, most recently from d5cabee to dd41a2f Compare March 30, 2020 16:56

anatoliykmetyuk self-assigned this Mar 31, 2020

anatoliykmetyuk reviewed Apr 1, 2020

View reviewed changes

project/Build.scala Outdated Show resolved Hide resolved

anatoliykmetyuk removed their assignment Apr 1, 2020

smarter mentioned this pull request Apr 16, 2020

Upgrade Mill to 213 com-lihaoyi/mill#723

Merged

martijnhoekstra force-pushed the uni branch from dd41a2f to d06e6c2 Compare April 23, 2020 12:59

Unicode escapes are ordinary escape sequences

006b647

martijnhoekstra force-pushed the uni branch from d06e6c2 to 006b647 Compare April 23, 2020 13:00

smarter requested changes Apr 23, 2020

View reviewed changes

compiler/src/dotty/tools/dotc/transform/localopt/StringInterpolatorOpt.scala Outdated Show resolved Hide resolved

smarter added this to the 0.24.0-RC1 milestone Apr 23, 2020

remove unused variable

c64f004

smarter approved these changes Apr 23, 2020

View reviewed changes

smarter merged commit d010ef7 into scala:master Apr 23, 2020

som-snytt reviewed Apr 24, 2020

View reviewed changes

martijnhoekstra mentioned this pull request Mar 7, 2021

Incorrect handle of unicode escapes in triple-quoted string #11640

Closed

smarter mentioned this pull request May 1, 2023

Specification: Various integrations from the reference #17383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode escapes are ordinary escape sequences #8480

Unicode escapes are ordinary escape sequences #8480

martijnhoekstra commented Mar 9, 2020

dottybot left a comment

martijnhoekstra Mar 9, 2020

smarter Mar 9, 2020

martijnhoekstra Mar 9, 2020

martijnhoekstra Mar 29, 2020

martijnhoekstra Mar 9, 2020

smarter Mar 9, 2020 •

edited

martijnhoekstra Mar 9, 2020

smarter Mar 9, 2020

som-snytt commented Mar 9, 2020

martijnhoekstra commented Mar 9, 2020 •

edited

martijnhoekstra commented Mar 9, 2020

som-snytt commented Mar 9, 2020

odersky commented Mar 31, 2020

martijnhoekstra commented Apr 23, 2020

smarter left a comment

som-snytt Apr 24, 2020 •

edited

som-snytt Apr 24, 2020

martijnhoekstra Apr 24, 2020

som-snytt Apr 24, 2020

Unicode escapes are ordinary escape sequences #8480

Unicode escapes are ordinary escape sequences #8480

Conversation

martijnhoekstra commented Mar 9, 2020

dottybot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarter Mar 9, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

som-snytt commented Mar 9, 2020

martijnhoekstra commented Mar 9, 2020 • edited

martijnhoekstra commented Mar 9, 2020

som-snytt commented Mar 9, 2020

odersky commented Mar 31, 2020

martijnhoekstra commented Apr 23, 2020

smarter left a comment

Choose a reason for hiding this comment

som-snytt Apr 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarter Mar 9, 2020 •

edited

martijnhoekstra commented Mar 9, 2020 •

edited

som-snytt Apr 24, 2020 •

edited