Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using non-ASCII Unicode characters outside of comments and literals #3092

Open
codefish1 opened this issue Apr 8, 2022 · 12 comments
Open

Comments

@codefish1
Copy link

codefish1 commented Apr 8, 2022

In error-prone 2.11.0 I've started getting the following error when building within IntelliJ

Foo.java:17:2
java: [UnicodeInCode] Avoid using non-ASCII Unicode characters outside of comments and literals, as they can be confusing.
    (see https://errorprone.info/bugpattern/UnicodeInCode)

When I view the file in VIM or HexDump there I can't see any non-unicode characters.

Line 17 is the end of the file, I can't supply the whole file due to work constraints. But below is a screenshot of the end of the file from hexedit
image

Within IntelliJ the formatter is doing
image

If I down grade error-prone to 2.10.0 it works fine on the offending file

@cushon
Copy link
Collaborator

cushon commented Apr 8, 2022

I think I've seen this a couple of times and hadn't got to the bottom of it yet.

To make it easier to debug, maybe we should improve the diagnostic to mention which non-unicode characters it thinks it's seeing.

@codefish1
Copy link
Author

Playing with the existing test, to add an assertion on the error and I noticed it already outputs the line in error along with a ^ pointing at the character in error. But I don't get that in these cases
Screenshot from 2022-04-08 19-56-01

@tbroyer
Copy link
Contributor

tbroyer commented Apr 8, 2022

AFAICT, because 99.9% of Java code is plain ASCII, the check is rather "dumb" and doesn't try to only flag problematic chars.

@codefish1
Copy link
Author

I think it's a bug which appears when running in IntelliJ

Using a file which fails in IntelliJ (2021.3.2 (Ultimate Edition)) the following test using the command line from the installation docs works. In addition a mvn compile on the command line works

javac \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
  -J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED \
  -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
  -XDcompilePolicy=simple \
  -processorpath error_prone_core-2.11.0-with-dependencies.jar:dataflow-errorprone-3.15.0.jar \
  '-Xplugin:ErrorProne -XepDisableWarningsInGeneratedCode -XepExcludedPaths:.*/target/generated-sources/.*' \ 
  filename.java

I've also copied the failing file to one side and done a diff to see it's the same as the failing one. Played about with the file a few times (adding and removing the last line) until it works and done a diff again. The diff shows no difference in the files.

@cushon
Copy link
Collaborator

cushon commented Apr 13, 2022

I wonder if IntelliJ is adding a unicode character to the buffer for some reason.

I'm going to update the diagnostic message to print the character it's seeing, which might help debug this.

copybara-service bot pushed a commit that referenced this issue Apr 15, 2022
To help debug #3092

PiperOrigin-RevId: 441567288
copybara-service bot pushed a commit that referenced this issue Apr 15, 2022
To help debug #3092

PiperOrigin-RevId: 442071506
@elefeint
Copy link

FYI, there is an issue filed on the IntelliJ side, too -- https://youtrack.jetbrains.com/issue/IDEA-288257

@chashnikov
Copy link

I've found the cause: Javac modifies content of file passed to it as char[] (see UnicodeReader.java:103) by replacing the last character by 0x1a. If this array is cached (the original implementation of Javac also does that, but code in intellij does this in a different way to improve performance), Error Prone may get this modified content and report an error. Note that this code in Javac was rewritten as part of JDK-8224225, so the problem shouldn't appear in Java 16 and newer versions.

@chashnikov
Copy link

I'm not sure how we can fix this on intellij side. We implement javax.tools.FileObject#getCharContent and cache content of the returned CharSequence, it's really unexpected that code in Javac casts the returned value to CharBuffer and modifies its content. Maybe this can be fixed in Error Prone? I think ignoring 0x1a symbol if it's the last character in the file text is a good workaround, I doubt that any real problems will be masked by such change.

@lwhite1
Copy link

lwhite1 commented Sep 30, 2022

@chashnikov FWIW, I still have this issue in Java 18 (Zulu) in Intellij.

@lwhite1
Copy link

lwhite1 commented Oct 13, 2022

Since this has been merged but is still open, can someone update this with the version where the fix will appear?

@cushon
Copy link
Collaborator

cushon commented Oct 13, 2022

This should have been included in the recent 2.16.0 release

@kenfreeman
Copy link

FYI, I still see this on occasion in 2.16. Seems to be less common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants