Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode code-point escape identifiers #92

Open
getify opened this issue Jun 15, 2015 · 4 comments
Open

Unicode code-point escape identifiers #92

getify opened this issue Jun 15, 2015 · 4 comments
Labels

Comments

@getify
Copy link
Contributor

getify commented Jun 15, 2015

var \u{20BB7} = 42;

seems in most ways equivalent to:

var 𠮷 = 42;

IIUC, the tree (at least as I see it with acorn) will take the former of these two and represent it as if it'd originally been the latter, even in the raw representation. Is that correct?

Unfortunately, it is possible to have an engine that supports the latter and not the former (I have it installed right now: Chrome 43). And therein lies my problem. I am trying to parse an ES6 file to see if it uses a unicode code-point escape form (the former) for the identifier, because that requires a different test than the symbol form itself (the latter).

Am I understanding this correctly? Is there no way via the estree format to tell the difference or to determine if the former was used? Even a flag on the Identifier node to indicate it was originally in the escaped form would be helpful. Is that possible?

On a similar note, if a tool wanted to parse a program and then recreate exactly as-written without changing this identifier, how could you go back to the former from the latter represented in the tree?

@nzakas
Copy link
Contributor

nzakas commented Jun 15, 2015

AFAIK, you are correct. Keep in mind that an identifier can have more than one Unicode code point escaped character, so the only possible flag would be to say, "somewhere in this identifier, there was at least one extended escape sequence," which also isn't enough information to get back to the raw representation.

On a similar note, if a tool wanted to parse a program and then recreate exactly as-written without changing this identifier, how could you go back to the former from the latter represented in the tree?

I don't think is a goal of ESTree, rather, you can return a representation of the AST as code but not necessarily the representation from which the AST was generated. Since you could use the actual character or the escape sequence, it would be up to your serializer to evaluate the identifier and determine how it should best be represented in the output.

@gibson042
Copy link

Isn't this a specialized subset of #41, to be addressed by a CST plan? After all, var C_DEAD = 0xBEEF and var C_\u0044\u0045\u0041\u0044 = 48879 yield identical ASTs.

@getify
Copy link
Contributor Author

getify commented Jun 15, 2015

@gibson042

I suppose it is. I was just trying to understand why I can get this out of acorn from '\u{20BB7}':

{
  "start": 0,
  "value": "𠮷",
  "raw": "'\\u{20BB7}'",
  "type": "Literal",
  "end": 11
}

But from \u{20BB7}, I get:

{
  "start": 0,
  "name": "𠮷",
  "type": "Identifier",
  "end": 9
}

Seems like a strange/inconsistent limitation. If CST is my only option here, just adds more weight to why I really want to figure that out.

@gibson042
Copy link

I can get this out of acorn from '\u{20BB7}': …
But from \u{20BB7}, I get: …

I would characterize "raw" as a Literal-only sneak preview of the benefits from going beyond abstract syntax.

If CST is my only option here, just adds more weight to why I really want to figure that out.

Indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants