New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform ES2015 Unicode Escapes to ES5 #11377
Conversation
As a follow up, could someone that's familiar with preset-env hook this up to http://kangax.github.io/compat-table/es6/#test-Unicode_code_point_escapes? |
90f7826
to
ff37f4f
Compare
will do! |
} | ||
|
||
throw path.buildCodeFrameError( | ||
`Can't represent "${name}" as a bare identifier`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can this happen? Since 𝒜\
is not a valid identifier, isn't it invalid also when represented as \u{1d49c}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://www.ecma-international.org/ecma-262/5.1/#sec-7.6, in ES5 the Identifier name aligns to Unicode version 3.0, in which only BMP and PUA Plane are defined, that means 𝒜, introduced later in Unicode version 3.1 and allocated to SMP, was never considered a valid identifier name.
Maybe we can state it clearly that any non-BMP characters are not accepted as identifier name in ES5 and so we cannot transform 𝒜 as an identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. 𝒜
is a valid identifier in ES6, and can be represented as both 𝒜
and \u{1d49c}
in ES6.
var 𝒜 = 1;
console.log(\u{1d49c}); // => 1
In ES5, 𝒜
is interpreted as two chars, \ud835
and \udc9c
(not the single char \u{1d49c}
). These individual chars must validate as an identifier. Because individual surrogates don't have Unicode categories, neither is considered a part of ID Start
or ID Continue
. So it's not possible to take the bare identifier 𝒜
and output \ud835\udc9c
, since it'll be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECMAScript implementations may recognise identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.
𝒜
was introduced in Unicode 3.1. So var 𝒜 = 1
may be invalid if the implementation does not align to Unicode >3.0, i.e. Node.js v0.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jridgewell Thanks for the explanation, I didn't know that in identifier you must validate the single characters rather than the whole code point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolo-ribaudo lol that is why we have long ugly regex in @babel/helper-validator-identifier
. And I have thought of replacing that by \p{ID_Start}\p{ID_Continue}+
but I give up since it will imply different behaviour between Node.js versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add this plugin to packages/babel-standalone/src/preset-es2015.js
added support in preset-env/standalone (will prolly fail test but verified it passes after rebase) |
Babel is [concerned](babel/babel#11377) with a few unique cases: 1. Escapes in strings 2. Escapes in bare identifiers 3. Escapes in the property keys These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
Babel is [concerned](babel/babel#11377) with a few unique cases: 1. Escapes in strings 2. Escapes in bare identifiers 3. Escapes in the property keys These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their `raw` values during runtime.
588ba30
to
390e213
Compare
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. Latest deployment of this branch, based on commit ebafdd4:
|
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/22367/ |
Babel is [concerned](babel/babel#11377) with a few unique cases: 1. Escapes in strings 2. Escapes in bare identifiers 3. Escapes in the property keys These require a slightly different transform for each, and only strings and property keys are actually transformable. If we encounter an escape in a bare identifier, we currently error out.
@jridgewell i updated the preset-env mappings after your compat table PR landed |
Thanks! |
Inspired by the ASCIIfier conversation at TC39, I decided to find out how Babel handled unicode surrogates. And apparently we just didn't ever write a transform for them. Maybe using non-ascii identifiers is already rare enough.
This isn't 100% full proof, since there are surrogate pairs that are valid identifiers that can't be encoded into ES5. And, Tagged Template Literals record can't be changed, because it would change their
raw
values during runtime.