Support full unicode in parser #2404

dondonz · 2021-06-28T01:20:22Z

This PR implements the RFC to support full Unicode in the parser.

Key spec changes

GraphQL now supports a wider range of Unicode characters. SourceCharacter was expanded to include any Unicode code point that is neither a leading nor trailing surrogate. Previously only up to U+FFFF included
Spec now includes guidance on Unicode surrogate pairs
(minor) GraphQL now allows certain control characters

Key changes in this PR

This PR has two halves:

UnicodeUtil, used by StringValueParsing. This is to handle braced escapes and escaped surrogate pairs
ANTLR grammar changes, which expand the definition of a SourceCharacter to mean any Unicode code point that is neither a leading nor trailing surrogate

References

RFC GitHub issue: graphql/graphql-spec#687
RFC spec text: graphql/graphql-spec#849
RFC JS implementation: graphql/graphql-js#3117
Previous PR: #2335

Want a Unicode fun fact? Groovy fails to compile if there are any Unicode code points that are not exactly four hex digits. You'll even encounter this compilation problem in COMMENTS.

For example: this comment containing RFC text will cause a compilation error

For example the input `"\uD83D\uDCA9"` is a valid {StringValue} which represents the same Unicode text as `"\u{1F4A9}"`.

The fix is to add an extra backslash

... which represents the same Unicode text as `"\\u{1F4A9}"`.

src/main/java/graphql/parser/UnicodeUtil.java

src/test/groovy/graphql/parser/UnicodeUtilParserTest.groovy

src/main/java/graphql/parser/UnicodeUtil.java

src/test/groovy/graphql/parser/UnicodeUtilParserTest.groovy

src/main/antlr/GraphqlCommon.g4

dondonz added 5 commits June 28, 2021 11:14

Add tests to support full Unicode parser

a0785c6

Fix Groovy Unicode parser issue

5594036

Add full Unicode to parser, the happy path

5b12cbe

Add maximum unicode value check

dc349a0

Fix typo

c719779

andimarek added this to the 17.0 milestone Jul 5, 2021

bbakerman reviewed Jul 5, 2021

View reviewed changes

src/main/java/graphql/parser/UnicodeUtil.java Outdated Show resolved Hide resolved

bbakerman reviewed Jul 5, 2021

View reviewed changes

src/main/java/graphql/parser/UnicodeUtil.java Outdated Show resolved Hide resolved

bbakerman reviewed Jul 5, 2021

View reviewed changes

src/test/groovy/graphql/parser/UnicodeUtilParserTest.groovy Outdated Show resolved Hide resolved

bbakerman reviewed Jul 5, 2021

View reviewed changes

src/test/groovy/graphql/parser/UnicodeUtilParserTest.groovy Outdated Show resolved Hide resolved

dondonz added 5 commits July 8, 2021 14:46

Add tests

ecfffc4

Add more end of string edge cases

7f75c6c

Add surrogate pair validation

06d334f

Merge branch 'master' into unicode-full-range

a907a49

Raise InvalidSyntaxException for invalid Unicode

5e788a4

bbakerman approved these changes Jul 9, 2021

View reviewed changes

src/main/java/graphql/parser/UnicodeUtil.java Outdated Show resolved Hide resolved

src/main/java/graphql/parser/UnicodeUtil.java Outdated Show resolved Hide resolved

src/test/groovy/graphql/parser/UnicodeUtilParserTest.groovy Outdated Show resolved Hide resolved

andimarek reviewed Jul 12, 2021

View reviewed changes

src/main/antlr/GraphqlCommon.g4 Outdated Show resolved Hide resolved

andimarek mentioned this pull request Jul 13, 2021

adds unicode braced escaping and tests #2335

Closed

dondonz added 6 commits July 14, 2021 11:54

Update ANTLR grammar with new SourceCharacter definition

dd290ea

Tidy test name and location

2287201

Add source location to Unicode error messages

c93fd5a

Add String parser overload when SourceLocation is not available

66c0f97

Add Parser tests with SourceLocation in exception message

85a5234

Add full invalid Unicode surrogate test

b60d28a

dondonz changed the title ~~WIP: Support full unicode in parser~~ Support full unicode in parser Jul 14, 2021

andimarek approved these changes Jul 14, 2021

View reviewed changes

andimarek merged commit 357c9bb into graphql-java:master Jul 14, 2021

bbakerman approved these changes Jul 14, 2021

View reviewed changes

This was referenced Aug 3, 2021

Update dependency com.graphql-java:graphql-java to v17 graphql-java-kickstart/graphql-java-tools#560

Merged

chore(deps): update dependency com.graphql-java:graphql-java to v17 graphql-java-kickstart/graphql-java-servlet#368

Merged

This was referenced Aug 4, 2021

chore(deps): update graphql java (ignoring snapshot builds) (major) graphql-java-kickstart/graphql-spring-boot#679

Merged

Update dependency com.graphql-java:graphql-java to v17 MarquezProject/marquez#1566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support full unicode in parser #2404

Support full unicode in parser #2404

dondonz commented Jun 28, 2021 •

edited

Support full unicode in parser #2404

Support full unicode in parser #2404

Conversation

dondonz commented Jun 28, 2021 • edited

Key spec changes

Key changes in this PR

References

dondonz commented Jun 28, 2021 •

edited