New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalac should remove comments before translating unicode escapes #3220
Comments
Imported From: https://issues.scala-lang.org/browse/SI-3220?orig=1 |
@ingoem said: |
@soc said: sorry for reopening, but after re-reading the spec, I'm quite sure that the procedure suggested is well within the bounds of the specification. This transformed version is equally acceptable to a compiler for the Java programming language ("Java compiler") and represents the exact same program. The second half of [JLS 2.2] reinforces my impression that it is not intended that parts of the comments should survive their removal before the program source enters the next phase, where only syntactic grammar is allowed: These input elements, with white space (�3.6) and comments (�3.7) discarded, form the terminal symbols for the syntactic grammar for the Java programming language and are called tokens (�3.5). To wrap up my arguments, I think that is a good idea to remove comments before entering the next phase of compilation because:
An additional problem with the current situation is that Scala will not even be syntax-compatible between the JVM- and the .NET-platform. If, in your opinion, the current behavior is the only valid one according to the JLS, it won't be possible to both fulfill the JLS and the ECMA standards. Having to choose between which spec to slightly 'adjust', I would suggest choosing the one which is more sensible (the ECMA one imho, which discards the comments). Eventually I want to point out that this is not a VM spec which we closely have to follow, but the spec of the Java Language, so we should have a bit more freedom to improve upon it in the Scala language. I would be really happy if you would kindly reconsider your decision based on the additional arguments and thoughts I tried to provide on that matter. Thanks! |
@soc said: But the fundamental problem persists: As Scala wants to support multiple platforms (JVM, .NET) it might be necessary to abstract a tiny bit from the specific platform behavior. It would be wise to choose a model which makes sure that a JVM-compatible source file is a .NET-compatible source file, too. In my opinion it is more consistent and understandable to remove the comments before converting the the Unicode escapes to raw Unicode. Personal opinion: I think that Java's behavior is an oversight, I can't imagine anyone saying "let's to it this way" on purpose. :-) |
@dubochet said: |
@som-snytt said: scala> "\u000A" // \u000A is LF Somehow, Scala progressed over Java. |
this has been raised again at https://ashannon.us/articles/scala-comment-exec.html I think if we went around on this again in 2018, the outcome might be different than the 2010 outcome. personally, I'd support a deprecate-then-remove on this. |
I didn't read the long parts, but
|
What's the process here @SethTisue? SIP? It needs a spec change and I guess a deprecation cycle. |
I think it should just be done away with, with no deprecation, and only a migration tool. As if comments weren't enough of a problem, there are a variety of other surprising behaviors and undesirable possibilities. Problems include: First, the feature enables difficult-to-understand variants of common code patterns: val sq = (x: Int) \u21D2 x * x
val i\u006ec = (x: Int) => x + 1
va\u0072 x = 7 // Am I mutable? None of these add any value at all. If they are there, it's probably a mistake. Secondly, we have the comment issue, as previously addressed: // Maybe you should try \u000a println("Ha ha, you already tried it!") Thirdly, string literals have rather non-intuitive behavior (meaning that they're not really literal): scala> """\n"""
res11: String = \n
scala> """\u000A"""
res12: String =
"
"scala> """\\u000A"""
res13: String = \\u000A
scala> """\\\u000A"""
res14: String =
"\\
" Note that this behavior means you can't actually write the text of a unicode escape into a raw string in any straightforward way, which is especially weird since you'd expect it to be easiest there. Fourth, unlike decades ago, UTF-8 is ubiquitous. You can just insert the character you care about in the code if you need it, greatly assisting readability. // Which is better?
val jérôme = 7
val j\u00e9r\u00f4me = 7 Fifth, there is no reason why late interpolation can't insert as many non-ASCII unicode characters as your heart desires into strings, characters, and backticked identifier names, and any existing instances should be able to be automatically translated. So I don't see any reason to keep it as it is for any cycles at all. It facilitates unclear code, presents difficulties for catching malicious code, creates irregularities in raw string processing, is useless in its full generality, and can be automatically converted anywhere UTF-8 or UTF-16 are allowed (which is approximately everywhere--and a compiler flag could be used to rescue people who are irreversibly attached to Windows-1252). |
When something is a lot of work to implement, then involving the SIP committee first can save work, if something is just going to end up being rejected. But waiting for the committee also costs time. Anyway, this is really minor stuff around the edges of the language. This one probably isn't much work to implement, so I'd suggest that someone simply submit a pull request (and yes, best if it includes a spec change). If the pull request becomes contentious, a SIP might eventually be asked for. Often that doesn't happen and the PR can just be merged. Even if a SIP does end up being asked for, it's still good to have the pull request in hand, because every SIP ultimately needs an implementation, and anyway having a green pull request to refer to makes any discussion so much more grounded and concrete. |
Calling attention to the issue on https://contributors.scala-lang.org can help rally support. Or unexpected opposition, but if that's coming, you might as well find out :-) |
fyi @adamdecaf |
Thanks for bumping the bug related to this. I couldn't quite find one. As @Ichoran pointed out unicode escapes often cause more problems than they fix and as we've seen can lead to problematic code. I'll help where I can to get this pushed through. Just let me know! |
|
@martijnhoekstra I think this can go forward if you're still interested (if you don't plan to tackle it, also let us know, since perhaps someone else would step up) |
I ran in to this and added some tests to document the status quo scala/scala#6501 I didn't know about this ticket. I guess the files should be renamed to |
Oh dear, I missed the ping on this one. I'll get back on this |
Status update: Needs to be ported to dotty too, to prevent progression regressing again when dotty becomes scala 3. |
It's I'll also look for any old tests named |
There's isn't a broader initiative to update Dotty with Scala 2.13 tests? I know they just updated the collections in Dotty with what's in 2.13.0. Hopefully, the tests will get refreshed as well at some point. |
I agree. This is what the people want. |
@ashawley it's the fix that needs to be ported to dotty, not the test. Well, also the tests, I suppose. |
smarter recently made a plea for ongoing syntax tweaks in Scala 2 to get fixes in dotty. We might consider an umbrella ticket in dotty to track these differences? |
Tracking them is nice, but someone still needs to implement. A situation where migrating to scala 3 means regressing in recent changes in scala 2 makes for a terrible migration story. Being able to point at an umbrella ticket doesn't change that. Dotty being the ruggedly handsome swashbuckling research compiler making bold forays into uncharted territory may need a lot of convincing that these kinds of boring syntax changes are worthy of it's time. |
Did we make scala> def \u0061 = "foo"
a: String illegal in a patch release? Until 2.13.2, this was the second paragraph of the Scala Language Spec:
|
@eed3si9n yes, scala/scala#8282 landed in 2.13.2 |
Consider this code:
This code does not compile, giving the following error method:
The problem is that the compiler transforms unicode escapes before removing the comments, leading to the following code given to the next compile phase:
This feature/bug is described in the Java book "Java Puzzlers" (see the sampler, it is Puzzle 2) as "thoroughly confusing". Corresponding Java spec is [JLS 3.4].
I would like to suggest removing lines of comments before converting escapes, because how it works now is highly surprising and unexpected.
The text was updated successfully, but these errors were encountered: