Transform ES2015 Unicode Escapes to ES5 #11377

jridgewell · 2020-04-04T06:48:51Z

Q	A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR Link	babel/website#2261
Any Dependency Changes?
License	MIT

Inspired by the ASCIIfier conversation at TC39, I decided to find out how Babel handled unicode surrogates. And apparently we just didn't ever write a transform for them. Maybe using non-ascii identifiers is already rare enough.

This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their raw values during runtime.

jridgewell · 2020-04-04T06:50:28Z

As a follow up, could someone that's familiar with preset-env hook this up to http://kangax.github.io/compat-table/es6/#test-Unicode_code_point_escapes?

existentialism · 2020-04-04T14:20:19Z

@jridgewell

As a follow up, could someone that's familiar with preset-env hook this up to

will do!

packages/babel-plugin-transform-unicode-escapes/src/index.js

nicolo-ribaudo · 2020-04-07T15:36:50Z

packages/babel-plugin-transform-unicode-escapes/src/index.js

+        }
+
+        throw path.buildCodeFrameError(
+          `Can't represent "${name}" as a bare identifier`,


When can this happen? Since 𝒜\ is not a valid identifier, isn't it invalid also when represented as \u{1d49c}?

Per https://www.ecma-international.org/ecma-262/5.1/#sec-7.6, in ES5 the Identifier name aligns to Unicode version 3.0, in which only BMP and PUA Plane are defined, that means 𝒜, introduced later in Unicode version 3.1 and allocated to SMP, was never considered a valid identifier name.

Maybe we can state it clearly that any non-BMP characters are not accepted as identifier name in ES5 and so we cannot transform 𝒜 as an identifier.

I'm not sure I understand. 𝒜 is a valid identifier in ES6, and can be represented as both 𝒜 and \u{1d49c} in ES6.

var 𝒜 = 1; console.log(\u{1d49c}); // => 1

In ES5, 𝒜 is interpreted as two chars, \ud835 and \udc9c (not the single char \u{1d49c}). These individual chars must validate as an identifier. Because individual surrogates don't have Unicode categories, neither is considered a part of ID Start or ID Continue. So it's not possible to take the bare identifier 𝒜 and output \ud835\udc9c, since it'll be invalid.

ECMAScript implementations may recognise identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.

𝒜 was introduced in Unicode 3.1. So var 𝒜 = 1 may be invalid if the implementation does not align to Unicode >3.0, i.e. Node.js v0.10.

@jridgewell Thanks for the explanation, I didn't know that in identifier you must validate the single characters rather than the whole code point.

@nicolo-ribaudo lol that is why we have long ugly regex in @babel/helper-validator-identifier. And I have thought of replacing that by \p{ID_Start}\p{ID_Continue}+ but I give up since it will imply different behaviour between Node.js versions.

packages/babel-plugin-transform-unicode-escapes/src/index.js

JLHwung

We should also add this plugin to packages/babel-standalone/src/preset-es2015.js

existentialism · 2020-05-17T03:37:25Z

added support in preset-env/standalone (will prolly fail test but verified it passes after rebase)

Babel is [concerned](babel/babel#11377) with a few unique cases: 1. Escapes in strings 2. Escapes in bare identifiers 3. Escapes in the property keys These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.

This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their `raw` values during runtime.

codesandbox-ci · 2020-05-19T03:42:48Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit ebafdd4:

Sandbox	Source
gallant-lewin-9ybzj	Configuration
condescending-microservice-2v46h	Configuration

babel-bot · 2020-05-19T03:42:59Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/22367/

Babel is [concerned](babel/babel#11377) with a few unique cases: 1. Escapes in strings 2. Escapes in bare identifiers 3. Escapes in the property keys These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.

existentialism · 2020-05-19T13:31:48Z

@jridgewell i updated the preset-env mappings after your compat table PR landed

jridgewell · 2020-05-19T18:28:06Z

Thanks!

jridgewell force-pushed the unicode-escapes branch 2 times, most recently from 90f7826 to ff37f4f Compare April 4, 2020 09:38

existentialism added the PR: New Feature 🚀 A type of pull request used for our changelog categories label Apr 4, 2020

existentialism approved these changes Apr 5, 2020

View reviewed changes

nicolo-ribaudo added this to the v7.10.0 milestone Apr 5, 2020

JLHwung self-requested a review April 6, 2020 02:26

kaicataldo approved these changes Apr 6, 2020

View reviewed changes

nicolo-ribaudo reviewed Apr 7, 2020

View reviewed changes

packages/babel-plugin-transform-unicode-escapes/src/index.js Outdated Show resolved Hide resolved

nicolo-ribaudo reviewed Apr 7, 2020

View reviewed changes

packages/babel-plugin-transform-unicode-escapes/src/index.js Show resolved Hide resolved

JLHwung reviewed Apr 22, 2020

View reviewed changes

nicolo-ribaudo added the PR: Needs Docs label Apr 27, 2020

jridgewell mentioned this pull request May 18, 2020

Split Unicode Escape tests compat-table/compat-table#1627

Merged

jridgewell and others added 11 commits May 18, 2020 23:38

Transform ES2015 Unicode Escapes to ES5

30959a2

This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their `raw` values during runtime.

Update outputs to better match default printing

22cf031

Undo template-literals change

1bb92dc

Fix description

57242b4

Simplify replaceUnicodeEscapes

cb7dad7

Rename fixture directory

0bc99ab

Cleanup code

3352b14

padStart in node 6

6022db0

Update error messages, and make them more helpful

98cded9

Add support for unicode-escapes in preset-env

d353b5b

Transform local bindings

390e213

jridgewell force-pushed the unicode-escapes branch from 588ba30 to 390e213 Compare May 19, 2020 03:40

update compat-data

ebafdd4

JLHwung approved these changes May 19, 2020

View reviewed changes

nicolo-ribaudo added PR: Ready to be Merged A pull request with already two approvals, but waiting for the next minor release and removed PR: Needs Docs labels May 19, 2020

nicolo-ribaudo merged commit 97f0b7c into babel:master May 24, 2020

jridgewell deleted the unicode-escapes branch May 26, 2020 16:12

nicolo-ribaudo mentioned this pull request Jun 15, 2020

Update transform-unicode-escapes test to output minimal strings #11721

Merged

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Aug 26, 2020

github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform ES2015 Unicode Escapes to ES5 #11377

Transform ES2015 Unicode Escapes to ES5 #11377

jridgewell commented Apr 4, 2020 •

edited by nicolo-ribaudo

jridgewell commented Apr 4, 2020

existentialism commented Apr 4, 2020

nicolo-ribaudo Apr 7, 2020

JLHwung Apr 7, 2020

jridgewell Apr 7, 2020

JLHwung Apr 7, 2020

nicolo-ribaudo Apr 7, 2020

JLHwung Apr 7, 2020

JLHwung left a comment

existentialism commented May 17, 2020

codesandbox-ci bot commented May 19, 2020 •

edited

babel-bot commented May 19, 2020 •

edited

existentialism commented May 19, 2020

jridgewell commented May 19, 2020

Transform ES2015 Unicode Escapes to ES5 #11377

Transform ES2015 Unicode Escapes to ES5 #11377

Conversation

jridgewell commented Apr 4, 2020 • edited by nicolo-ribaudo

jridgewell commented Apr 4, 2020

existentialism commented Apr 4, 2020

nicolo-ribaudo Apr 7, 2020

Choose a reason for hiding this comment

JLHwung Apr 7, 2020

Choose a reason for hiding this comment

jridgewell Apr 7, 2020

Choose a reason for hiding this comment

JLHwung Apr 7, 2020

Choose a reason for hiding this comment

nicolo-ribaudo Apr 7, 2020

Choose a reason for hiding this comment

JLHwung Apr 7, 2020

Choose a reason for hiding this comment

JLHwung left a comment

Choose a reason for hiding this comment

existentialism commented May 17, 2020

codesandbox-ci bot commented May 19, 2020 • edited

babel-bot commented May 19, 2020 • edited

existentialism commented May 19, 2020

jridgewell commented May 19, 2020

jridgewell commented Apr 4, 2020 •

edited by nicolo-ribaudo

codesandbox-ci bot commented May 19, 2020 •

edited

babel-bot commented May 19, 2020 •

edited