Avoid using non-ASCII Unicode characters outside of comments and literals #3092

codefish1 · 2022-04-08T17:18:38Z

In error-prone 2.11.0 I've started getting the following error when building within IntelliJ

Foo.java:17:2
java: [UnicodeInCode] Avoid using non-ASCII Unicode characters outside of comments and literals, as they can be confusing.
    (see https://errorprone.info/bugpattern/UnicodeInCode)

When I view the file in VIM or HexDump there I can't see any non-unicode characters.

Line 17 is the end of the file, I can't supply the whole file due to work constraints. But below is a screenshot of the end of the file from hexedit

Within IntelliJ the formatter is doing

If I down grade error-prone to 2.10.0 it works fine on the offending file

The text was updated successfully, but these errors were encountered:

cushon · 2022-04-08T17:29:30Z

I think I've seen this a couple of times and hadn't got to the bottom of it yet.

To make it easier to debug, maybe we should improve the diagnostic to mention which non-unicode characters it thinks it's seeing.

codefish1 · 2022-04-08T18:58:10Z

Playing with the existing test, to add an assertion on the error and I noticed it already outputs the line in error along with a ^ pointing at the character in error. But I don't get that in these cases

tbroyer · 2022-04-08T19:49:46Z

AFAICT, because 99.9% of Java code is plain ASCII, the check is rather "dumb" and doesn't try to only flag problematic chars.

codefish1 · 2022-04-08T21:28:53Z

I think it's a bug which appears when running in IntelliJ

Using a file which fails in IntelliJ (2021.3.2 (Ultimate Edition)) the following test using the command line from the installation docs works. In addition a mvn compile on the command line works

javac \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
  -XDcompilePolicy=simple \
  -processorpath error_prone_core-2.11.0-with-dependencies.jar:dataflow-errorprone-3.15.0.jar \
  '-Xplugin:ErrorProne -XepDisableWarningsInGeneratedCode -XepExcludedPaths:.*/target/generated-sources/.*' \ 
  filename.java

I've also copied the failing file to one side and done a diff to see it's the same as the failing one. Played about with the file a few times (adding and removing the last line) until it works and done a diff again. The diff shows no difference in the files.

cushon · 2022-04-13T20:39:04Z

I wonder if IntelliJ is adding a unicode character to the buffer for some reason.

I'm going to update the diagnostic message to print the character it's seeing, which might help debug this.

To help debug #3092 PiperOrigin-RevId: 441567288

To help debug #3092 PiperOrigin-RevId: 442071506

elefeint · 2022-05-20T15:11:37Z

FYI, there is an issue filed on the IntelliJ side, too -- https://youtrack.jetbrains.com/issue/IDEA-288257

chashnikov · 2022-08-10T16:42:53Z

I've found the cause: Javac modifies content of file passed to it as char[] (see UnicodeReader.java:103) by replacing the last character by 0x1a. If this array is cached (the original implementation of Javac also does that, but code in intellij does this in a different way to improve performance), Error Prone may get this modified content and report an error. Note that this code in Javac was rewritten as part of JDK-8224225, so the problem shouldn't appear in Java 16 and newer versions.

chashnikov · 2022-08-10T16:48:04Z

I'm not sure how we can fix this on intellij side. We implement javax.tools.FileObject#getCharContent and cache content of the returned CharSequence, it's really unexpected that code in Javac casts the returned value to CharBuffer and modifies its content. Maybe this can be fixed in Error Prone? I think ignoring 0x1a symbol if it's the last character in the file text is a good workaround, I doubt that any real problems will be masked by such change.

…ffer #3092 PiperOrigin-RevId: 467537488

…ffer #3092 PiperOrigin-RevId: 467666418

lwhite1 · 2022-09-30T14:57:27Z

@chashnikov FWIW, I still have this issue in Java 18 (Zulu) in Intellij.

lwhite1 · 2022-10-13T18:26:08Z

Since this has been merged but is still open, can someone update this with the version where the fix will appear?

cushon · 2022-10-13T18:31:15Z

This should have been included in the recent 2.16.0 release

kenfreeman · 2022-11-08T17:19:16Z

FYI, I still see this on occasion in 2.16. Seems to be less common.

copybara-service bot pushed a commit that referenced this issue Apr 15, 2022

Include the unicode character in the diagnostic message

c17cee3

To help debug #3092 PiperOrigin-RevId: 441567288

copybara-service bot mentioned this issue Apr 15, 2022

Include the unicode character in the diagnostic message #3119

Merged

copybara-service bot pushed a commit that referenced this issue Apr 15, 2022

Include the unicode character in the diagnostic message

726d179

To help debug #3092 PiperOrigin-RevId: 442071506

copybara-service bot pushed a commit that referenced this issue Aug 15, 2022

In UnicodeInCode, allow ASCII_SUB as the last character in the bu…

d5acbf3

…ffer #3092 PiperOrigin-RevId: 467537488

copybara-service bot mentioned this issue Aug 15, 2022

In UnicodeInCode, allow ASCII_SUB as the last character in the buffer #3392

Merged

copybara-service bot pushed a commit that referenced this issue Aug 15, 2022

In UnicodeInCode, allow ASCII_SUB as the last character in the bu…

c3d267e

…ffer #3092 PiperOrigin-RevId: 467666418

danhermann mentioned this issue Oct 24, 2022

Upgrade errorprone to 2.16 slackhq/astra#386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid using non-ASCII Unicode characters outside of comments and literals #3092

Avoid using non-ASCII Unicode characters outside of comments and literals #3092

codefish1 commented Apr 8, 2022 •

edited

cushon commented Apr 8, 2022

codefish1 commented Apr 8, 2022

tbroyer commented Apr 8, 2022

codefish1 commented Apr 8, 2022

cushon commented Apr 13, 2022

elefeint commented May 20, 2022

chashnikov commented Aug 10, 2022

chashnikov commented Aug 10, 2022

lwhite1 commented Sep 30, 2022

lwhite1 commented Oct 13, 2022

cushon commented Oct 13, 2022

kenfreeman commented Nov 8, 2022

Avoid using non-ASCII Unicode characters outside of comments and literals #3092

Avoid using non-ASCII Unicode characters outside of comments and literals #3092

Comments

codefish1 commented Apr 8, 2022 • edited

cushon commented Apr 8, 2022

codefish1 commented Apr 8, 2022

tbroyer commented Apr 8, 2022

codefish1 commented Apr 8, 2022

cushon commented Apr 13, 2022

elefeint commented May 20, 2022

chashnikov commented Aug 10, 2022

chashnikov commented Aug 10, 2022

lwhite1 commented Sep 30, 2022

lwhite1 commented Oct 13, 2022

cushon commented Oct 13, 2022

kenfreeman commented Nov 8, 2022

codefish1 commented Apr 8, 2022 •

edited