Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: skip ascii symbol in process #31

Merged
merged 3 commits into from Sep 29, 2023

Conversation

JLHwung
Copy link
Collaborator

@JLHwung JLHwung commented Sep 17, 2019

In this PR we skip the processItem when it is a symbol-kind and has code point within ASCII range.

This fix will prevent - transpiled as \\x2D when it is not in character class. Besides that, it should speed up a little bit since ASCII symbols are pretty common.

@nicolo-ribaudo
Copy link
Collaborator

It would be good to have a test with - inside a character class, e.g. /[--\\]/

@JLHwung
Copy link
Collaborator Author

JLHwung commented Sep 18, 2019

@nicolo-ribaudo Good question! This PR does not touch - in character class by design.

Currently, every items in CharacterClass is added to a single regenerate set so that we can perform possible code range concatenations. However, the regenerate lacks of the token information: i.e. the kind of token where - is a symbol and \- is an identifier. Therefore, regenerate has to output escaped \\x2D even if it may be redundant within the context.

@mathiasbynens
Copy link
Owner

Did you see #16?

@JLHwung
Copy link
Collaborator Author

JLHwung commented Sep 18, 2019

Sorry I didn't see #16 before.

What's happening in #16 is that - is passed to regenerate and regenerate.toString returns \-, which is invalid when unicode flag is enabled, so regenerate now returns \x2D.

However, when - is a symbol-kind and not in a character class, it should not be transpiled at all. Note that \- is an identifier-kind, which means \- will still pass into regenerate and output \x2D.

@mathiasbynens
Copy link
Owner

I'm okay with making - not be escaped outside of character classes. It’s correct that this logic needs to live in regexpu-core and not regenerate, since regenerate’s output must work in any context (in and outside character classes).

But let’s keep the behavior for other ASCII symbols (in particular, non-printable ones) as-is.

@@ -256,6 +256,10 @@ const processTerm = (item, regenerateOptions, groups) => {
break;
case 'value':
const codePoint = item.codePoint;
if (item.kind === "symbol" && codePoint >= 0x20 && codePoint <= 0x7E) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as codePoint == 0x2D, since many ascii symbols must be escaped: [, {, |, +, ?, *, $, ...

Copy link
Collaborator Author

@JLHwung JLHwung Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaped character, i.e. \[ is identifier kind instead of symbol kind. If regjsparser parses it safely as symbol, we don't have to escape it.

This is how /\{/ will be parsed:

{
  "type": "value",
  "kind": "identifier",
  "codePoint": 123,
  "range": [
    0,
    2
  ],
  "raw": "\\{"
}

@mathiasbynens mathiasbynens changed the base branch from master to main June 18, 2020 06:50
@JLHwung
Copy link
Collaborator Author

JLHwung commented Sep 18, 2023

The CI error is fixed in #81.

@nicolo-ribaudo
Copy link
Collaborator

@JLHwung could you rebase?

@JLHwung JLHwung merged commit 55f3e5e into mathiasbynens:main Sep 29, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants