New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support full unicode in parser #2404
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bbakerman
reviewed
Jul 5, 2021
bbakerman
reviewed
Jul 5, 2021
bbakerman
reviewed
Jul 5, 2021
bbakerman
reviewed
Jul 5, 2021
bbakerman
approved these changes
Jul 9, 2021
andimarek
reviewed
Jul 12, 2021
dondonz
changed the title
WIP: Support full unicode in parser
Support full unicode in parser
Jul 14, 2021
andimarek
approved these changes
Jul 14, 2021
bbakerman
approved these changes
Jul 14, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the RFC to support full Unicode in the parser.
Key spec changes
SourceCharacter
was expanded to include any Unicode code point that is neither a leading nor trailing surrogate. Previously only up to U+FFFF includedKey changes in this PR
This PR has two halves:
UnicodeUtil
, used byStringValueParsing
. This is to handle braced escapes and escaped surrogate pairsSourceCharacter
to mean any Unicode code point that is neither a leading nor trailing surrogateReferences
RFC GitHub issue: graphql/graphql-spec#687
RFC spec text: graphql/graphql-spec#849
RFC JS implementation: graphql/graphql-js#3117
Previous PR: #2335
Want a Unicode fun fact? Groovy fails to compile if there are any Unicode code points that are not exactly four hex digits. You'll even encounter this compilation problem in COMMENTS.
For example: this comment containing RFC text will cause a compilation error
The fix is to add an extra backslash