Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform ES2015 Unicode Escapes to ES5 #11377

Merged
merged 12 commits into from May 24, 2020

Conversation

jridgewell
Copy link
Member

@jridgewell jridgewell commented Apr 4, 2020

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link babel/website#2261
Any Dependency Changes?
License MIT

Inspired by the ASCIIfier conversation at TC39, I decided to find out how Babel handled unicode surrogates. And apparently we just didn't ever write a transform for them. Maybe using non-ascii identifiers is already rare enough.

This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their raw values during runtime.

@jridgewell
Copy link
Member Author

As a follow up, could someone that's familiar with preset-env hook this up to http://kangax.github.io/compat-table/es6/#test-Unicode_code_point_escapes?

@jridgewell jridgewell force-pushed the unicode-escapes branch 2 times, most recently from 90f7826 to ff37f4f Compare April 4, 2020 09:38
@existentialism
Copy link
Member

@jridgewell

As a follow up, could someone that's familiar with preset-env hook this up to

will do!

@existentialism existentialism added the PR: New Feature 🚀 A type of pull request used for our changelog categories label Apr 4, 2020
@nicolo-ribaudo nicolo-ribaudo added this to the v7.10.0 milestone Apr 5, 2020
@JLHwung JLHwung self-requested a review April 6, 2020 02:26
}

throw path.buildCodeFrameError(
`Can't represent "${name}" as a bare identifier`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen? Since 𝒜\ is not a valid identifier, isn't it invalid also when represented as \u{1d49c}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://www.ecma-international.org/ecma-262/5.1/#sec-7.6, in ES5 the Identifier name aligns to Unicode version 3.0, in which only BMP and PUA Plane are defined, that means 𝒜, introduced later in Unicode version 3.1 and allocated to SMP, was never considered a valid identifier name.

Maybe we can state it clearly that any non-BMP characters are not accepted as identifier name in ES5 and so we cannot transform 𝒜 as an identifier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. 𝒜 is a valid identifier in ES6, and can be represented as both 𝒜 and \u{1d49c} in ES6.

var 𝒜 = 1;
console.log(\u{1d49c}); // => 1

In ES5, 𝒜 is interpreted as two chars, \ud835 and \udc9c (not the single char \u{1d49c}). These individual chars must validate as an identifier. Because individual surrogates don't have Unicode categories, neither is considered a part of ID Start or ID Continue. So it's not possible to take the bare identifier 𝒜 and output \ud835\udc9c, since it'll be invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECMAScript implementations may recognise identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.

𝒜 was introduced in Unicode 3.1. So var 𝒜 = 1 may be invalid if the implementation does not align to Unicode >3.0, i.e. Node.js v0.10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jridgewell Thanks for the explanation, I didn't know that in identifier you must validate the single characters rather than the whole code point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolo-ribaudo lol that is why we have long ugly regex in @babel/helper-validator-identifier. And I have thought of replacing that by \p{ID_Start}\p{ID_Continue}+ but I give up since it will imply different behaviour between Node.js versions.

Copy link
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add this plugin to packages/babel-standalone/src/preset-es2015.js

@existentialism
Copy link
Member

added support in preset-env/standalone (will prolly fail test but verified it passes after rebase)

jridgewell added a commit to jridgewell/compat-table that referenced this pull request May 18, 2020
Babel is [concerned](babel/babel#11377) with a few unique cases:

1. Escapes in strings
2. Escapes in bare identifiers
3. Escapes in the property keys

These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
jridgewell added a commit to jridgewell/compat-table that referenced this pull request May 18, 2020
Babel is [concerned](babel/babel#11377) with a few unique cases:

1. Escapes in strings
2. Escapes in bare identifiers
3. Escapes in the property keys

These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
@codesandbox-ci
Copy link

codesandbox-ci bot commented May 19, 2020

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit ebafdd4:

Sandbox Source
gallant-lewin-9ybzj Configuration
condescending-microservice-2v46h Configuration

@babel-bot
Copy link
Collaborator

babel-bot commented May 19, 2020

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/22367/

chicoxyzzy pushed a commit to compat-table/compat-table that referenced this pull request May 19, 2020
Babel is [concerned](babel/babel#11377) with a few unique cases:

1. Escapes in strings
2. Escapes in bare identifiers
3. Escapes in the property keys

These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
@existentialism
Copy link
Member

@jridgewell i updated the preset-env mappings after your compat table PR landed

@jridgewell
Copy link
Member Author

Thanks!

@nicolo-ribaudo nicolo-ribaudo added PR: Ready to be Merged A pull request with already two approvals, but waiting for the next minor release and removed PR: Needs Docs labels May 19, 2020
@nicolo-ribaudo nicolo-ribaudo merged commit 97f0b7c into babel:master May 24, 2020
@jridgewell jridgewell deleted the unicode-escapes branch May 26, 2020 16:12
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Aug 26, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue PR: New Feature 🚀 A type of pull request used for our changelog categories PR: Ready to be Merged A pull request with already two approvals, but waiting for the next minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants