Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode escape sequence for backslash symbol (`\u005c') is treated differently since 2.13.2 #12293

Closed
unkarjedy opened this issue Dec 22, 2020 · 8 comments

Comments

@unkarjedy
Copy link

reproduction steps

Since 2.13.2 Unicode escape sequence for backslash (\u005c) is treated as an escaped backslash \\ at a source level.
In 2.12.12 & 2.11.12 it's treated as single \ at a source level (except raw interpolated string literals and non-interpolated multiline string literals)

I couldn't find any related ticket, PR or discussion.
Is it as designed or a regression?

"\u005c\u005c"
"""\u005c\u005c"""
s"\u005c\u005c"
s"""\u005c\u005c"""
raw"\u005c\u005c"
raw"""\u005c\u005c"""

image

image

@unkarjedy
Copy link
Author

@martijnhoekstra could this change #11966 somehow lead to this?

@martijnhoekstra
Copy link

martijnhoekstra commented Dec 22, 2020

This is scala/scala#8282, listed in the release notes of 2.13.2 at https://github.com/scala/scala/releases/tag/v2.13.2: Unicode escapes are now ordinary escape sequences (not processed early) (8282), and indeed as intended.

Instead of "\u005c\u005c", use "\\". Instead of, for example, "\u005ct", use "\\t". That was, at least by me, seen as an improvement. Are there any other issues I'm not thinking of that you're running in to?

@unkarjedy
Copy link
Author

unkarjedy commented Dec 22, 2020

Oh, sorry, that was so dum from me...
I've checked 2.13.4 notes, but after I discovered that the issue appeared in 2.13.2 I didn't recheck it's release notes

That was, at least by me, seen as an improvement

I also think this is an improvement. Less "WTF" cases. (though technically it was a regression)

Are there any other issues I'm not thinking of that you're running in to?

No, I am fixing various edge cases with string literals in IntelliJ Scala Plugin (in different subsystems), and I noticed this difference.
Wanted to ensure which is the expected behaviour. Sorry for bothering.

@martijnhoekstra
Copy link

This will definitely cut down on your sting literal edgecases.

@unkarjedy
Copy link
Author

@martijnhoekstra
Not sure whether to create a new issue..
Is this expected that the output for the input:

scala.util.Properties.versionString

"""\"""
"""\\"""
"""\\\"""
"""\u0025"""
"""\\u0025"""
"""\\\u0025"""
"""\\\\u0025"""

will be
image

I understand why it works so, but it looks like a bug to me.
I would expect output to be:

val res1: String = \
val res2: String = \\
val res3: String = \\\
val res4: String = %
val res5: String = \%
val res6: String = \\%
val res7: String = \\\%

WDYT?

@martijnhoekstra
Copy link

martijnhoekstra commented Jan 14, 2021

That would totally make sense, but isn't done to maintain backwards compatibility. The PR mentions it

That last condition is really weird since the backslash itself isn't escaped or anything, but it's a carryover of how the scanner used to determine whether to process the escape or not.

There is also a comment in the source where it happens:

If a backslash is followed by one or more u characters and there is
an odd number of backslashes immediately preceding the u, processing
the escape is attempted and an invalid escape is an error.
The odd backslashes rule is, well, odd, but is grandfathered in from
pre-2.13.2 times, when this same rule existed in the scanner, and was also
odd. Since escape handling here is for backwards compatibility only, that
backwards compatibility is also retained.

There is code out in the wild that breaks on the change.

@som-snytt
Copy link

The odd rule is inherited from Java, JLS 3.3. Also multiple uu. That has to do with round-tripping to ASCII.

@unkarjedy
Copy link
Author

Ok, guess we will just leave with it until everyone migrates to Scala 3 with removed Unicode escapes in raw literals.

No questions regarding \uuu0025, cause AFAIR it existed for ages in Scala and was inherited from Java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants