New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode escapes are now ordinary escape sequences (not processed early) #8282
Conversation
220fc45
to
8b06af7
Compare
Regarding The setting was passed to Removing that parameter outright is both source and binary breaking though, and probably not worth the deprecation cycle (though I'm happy to accommodate whatever if greater minds think unlike) |
Linking scala/bug#3220 which this closes. @SethTisue could you kick off a community build for this branch? |
5a76a6f
to
0bbdab7
Compare
testing 2.13.1-bin-19737c1-SNAPSHOT: https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/2432/ |
|
f5c2800
to
dbb8483
Compare
@SethTisue could you start another community build? The parser combinators failures are hopefully fixed now (and the other one was spurious) |
I'm seeing a lot of community build stuff going on, but I can't figure out where, if anywhere, this branch is running now. Could you take a look and squeeze one in if you're not too swamped (or the servers) with the build itself? |
oops, I shouldn't have canceled 2439, I got confused as well. (Jenkins doesn't let you attach a description to a run until the run actually starts) queued: https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/2449/ |
consider that a green run — the handful of failures are unrelated to your changes |
No longer WIP. Not sure whom to request for review, there is a bit of spec, a bit of parser, a bit of interpolation. |
711a638
to
cf783ec
Compare
It should. It's a good test case to add.
…On Sat, Aug 17, 2019, 12:45 volth ***@***.***> wrote:
So, s"\u0022$a\u0022" is now valid ?
After deprecation s"\042$a\042" in 2.13 it is great news.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8282?email_source=notifications&email_token=AAGXOEPCDVXOV7RA2BMGSOTQE7JEHA5CNFSM4IH6G5OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QIXHA#issuecomment-522226588>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGXOEKWWQ52XQFBRBFPTM3QE7JEHANCNFSM4IH6G5OA>
.
|
ee96400
to
a83a517
Compare
/rebuild |
a83a517
to
36a5aee
Compare
rebased again for mima exceptions. |
What's the state of this in Dotty? I'd almost call this an uncontroversial bug fix that could probably be done without a SIP but we should at least have Dotty on board with this change. Rescheduling for 2.13.2 for now. |
I've put this back on the 2.13.2 milestone, because it seems nearly (entirely?) ready for merge, and based on Lukas's feedback so far. (it will need squashing before final merge, of course) |
I support merging this for 2.13.2. To summarize the breaking changes:
There's currently no migration tool (?) - would it be hard to write a scalafix rule? Can we include a one-liner in the release notes that fixes arbitrary code bases? |
There is a third and somewhat insidious (source, but not binary) breaking change: If you have a custom interpolator, unicode is no longer escaped in the interpolator. I don't know about a scalafix. Looking at https://astexplorer.net/#/gist/ec56167ffafb20cbd8d68f24a37043a9/a34662ddd426356d811d29103e557e608c0e9f14 it is plausible it's not so easy to detect with scalameta. |
as I said at the team meeting today, I'm comfortable with this change for 2.13.2 despite the string interpolator thing, because:
|
🎉 epochal, scala/bug#3220 was submitted on Mar 27, 2010. |
Just catching up, thank god sbt is 2.12 only 😄 https://github.com/sbt/sbt/blob/0b12862caf63d1c23449391be6329d947b474157/main-settings/src/main/scala/sbt/std/InputWrapper.scala#L26-L31 |
The lines you link wouldn't be a problem (they're in strings), but the methods themselves should then be literal or backquoted in this PR -- either
or
That wouldn't be that terrible a migration now, right? |
BTW, you could also already open a PR against dotty that uses a nightly build of scala-library, so we can review it in time for 2.13.2. |
Strictly speaking, isn't Scala epoch-major-point? |
Good question! If you mean the standard library and the ABI and nothing else, then yes. But considering the releases as wholes, no. We often add features in minor releases. So that already makes them "minor" and not "point". Admittedly, it is rarer that we break backwards source compatibility, as we are doing here. It's not something we should be cavalier about, for sure, but I've already listed some mitigating factors, and I'll add one more: there is no longer any 2.14 release to defer these sorts of changes to, so I think that argues for opening the door a little wider on allowable 2.13.x changes. This series will be with us for a long time. (If this is turning into a broader discussion about the 2.13.x release philosophy, we should probably move to contributors.scala-lang.org. But if there are further comments that still relate directly to this change in particular, that's fine here.) |
@martijnhoekstra can I ask you to revise the PR description so it contains all of the need-to-know information for users in one place? in particular, the info on breaking changes that came out in later comments |
Sure! |
The regular expressions for `CodeBlockStartRegex` and `CodeBlockEndRegex` both contain two instances of the Unicode escape `\u000E`. This is the "Shift In" character. I expect that it was inserted as part of a copy/paste error. Unicode escapes in triple quote strings are deprecated as of 2.13.2 (scala/scala#8282). Further, this character actually makes the regular expression invalid if it is interpreted. This isn't a big deal right now, as it appears to be ignored on Scala 2.12.x, but on Scala 2.13.x this will cause the regular expressions to fail for Scaladoc using the `<pre>` tag. For example, ```scala import scala.util.matching._ object Main { val doc0: String = """ | /** A foo is a bar, for example. | * | * {{{ | * val foo: String = "bar" | * }}} | * | * <pre> | * val bar: String = "baz | * </pre> | */""".stripMargin val CodeBlockStartRegex = new Regex("""(.*?)((?:\{\{\{)|(?:\u000E<pre(?: [^>]*)?>\u000E))(.*)""") val CodeBlockStartRegex0 = new Regex("""(.*?)((?:\{\{\{)|(?:<pre(?: [^>]*)?>))(.*)""") def matchInfo(regex: Regex, value: CharSequence): Unit = { println(s"\nTarget: ${value}") println(s"Regex: ${regex}") val matches: List[Regex.Match] = regex.findAllMatchIn(value).toList println(s"Match Count: ${matches.size}") println(s"Matches: ${matches}") } def main(args: Array[String]): Unit = { matchInfo(CodeBlockStartRegex, doc0) matchInfo(CodeBlockStartRegex0, doc0) } } ``` When run with 2.13.4 yields this result, ```shell warning: 1 deprecation (since 2.13.2); re-run with -deprecation for details 1 warning Picked up JAVA_TOOL_OPTIONS: -Dsbt.supershell=false Target: /** A foo is a bar, for example. * * {{{ * val foo: String = "bar" * }}} * * <pre> * val bar: String = "baz * </pre> */ Regex: (.*?)((?:\{\{\{)|(?:�<pre(?: [^>]*)?>�))(.*) Match Count: 1 Matches: List( * {{{) Target: /** A foo is a bar, for example. * * {{{ * val foo: String = "bar" * }}} * * <pre> * val bar: String = "baz * </pre> */ Regex: (.*?)((?:\{\{\{)|(?:<pre(?: [^>]*)?>))(.*) Match Count: 2 Matches: List( * {{{, * <pre>) ``` Note how the first output only found one match, the `{{{` based one, but the second one found both. Finally, a small test was added to ensure that the change does not break comment parsing.
The regular expressions for `CodeBlockStartRegex` and `CodeBlockEndRegex` both contain two instances of the Unicode escape `\u000E`. This is the "Shift In" character. I expect that it was inserted as part of a copy/paste error. Unicode escapes in triple quote strings are deprecated as of 2.13.2 (scala/scala#8282). Further, this character actually makes the regular expression invalid if it is interpreted. This isn't a big deal right now, as it appears to be ignored on Scala 2.12.x, but on Scala 2.13.x this will cause the regular expressions to fail for Scaladoc using the `<pre>` tag. For example, ```scala import scala.util.matching._ object Main { val doc0: String = """ | /** A foo is a bar, for example. | * | * {{{ | * val foo: String = "bar" | * }}} | * | * <pre> | * val bar: String = "baz | * </pre> | */""".stripMargin val CodeBlockStartRegex = new Regex("""(.*?)((?:\{\{\{)|(?:\u000E<pre(?: [^>]*)?>\u000E))(.*)""") val CodeBlockStartRegex0 = new Regex("""(.*?)((?:\{\{\{)|(?:<pre(?: [^>]*)?>))(.*)""") def matchInfo(regex: Regex, value: CharSequence): Unit = { println(s"\nTarget: ${value}") println(s"Regex: ${regex}") val matches: List[Regex.Match] = regex.findAllMatchIn(value).toList println(s"Match Count: ${matches.size}") println(s"Matches: ${matches}") } def main(args: Array[String]): Unit = { matchInfo(CodeBlockStartRegex, doc0) matchInfo(CodeBlockStartRegex0, doc0) } } ``` When run with 2.13.4 yields this result, ```shell warning: 1 deprecation (since 2.13.2); re-run with -deprecation for details 1 warning Picked up JAVA_TOOL_OPTIONS: -Dsbt.supershell=false Target: /** A foo is a bar, for example. * * {{{ * val foo: String = "bar" * }}} * * <pre> * val bar: String = "baz * </pre> */ Regex: (.*?)((?:\{\{\{)|(?:�<pre(?: [^>]*)?>�))(.*) Match Count: 1 Matches: List( * {{{) Target: /** A foo is a bar, for example. * * {{{ * val foo: String = "bar" * }}} * * <pre> * val bar: String = "baz * </pre> */ Regex: (.*?)((?:\{\{\{)|(?:<pre(?: [^>]*)?>))(.*) Match Count: 2 Matches: List( * {{{, * <pre>) ``` Note how the first output only found one match, the `{{{` based one, but the second one found both. Finally, a small test was added to ensure that the change does not break comment parsing.
Unicode escapes are handled like any other escape, in any context a normal escape could also work.
That means Unicode escapes get expanded in
Examples:
You can now use a Unicode escape sequence to represent a single quote in a single quoted s or f interpolation, in other words, s"\u0022foo\u0022" is now a valid way to represent
"\"foo\""
Triple quoted strings and raw interpolations don't allow normal escapes, but have allowed unicode escapes, since those were handled earlier, in the scanner. In this PR doing so emits a deprecation warning, unless under -xSource:2.14, where the Unicode escapes are simply unprocessed
Examples:
Using Unicode escapes anywhere else will no longer cause them to be processed. In all those cases the alternative is to use the literal character represented by the escape sequence.
This includes interpolations. If your custom interpolation needs to process unicode escapes, it should do so from within the interpolator rather than relying on the parser to process these escapes.
example
Examples that no longer compile (with no deprecation):
Fixes scala/bug#3220
(Curious why it ever worked the old way? Because Java did it, as explained at https://www.sitepoint.com/java-unicode-mysterious-compile-error/)