Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce exprAllowed usage #13431

Merged
merged 18 commits into from Jun 9, 2021
Merged

Conversation

JLHwung
Copy link
Contributor

@JLHwung JLHwung commented Jun 7, 2021

Q                       A
Tests Added + Pass? Yes
License MIT

This PR reduces the tokenizer state exprAllowed usage. It is now used only in the JSX plugin and we could consider move exprAllowed updates to the plugin.

The exprAllowed was used for 1) checking regex start, 2) allowing *=> in the Flow plugin and 3) checking JSX expression start. It turns out we can come up with an alternative approach for the 1) and 2) use case, which greatly simplifies the tokenizer context update logic which involves many checks.

In this PR, the tokenizer context now only takes care of whether } matches to ${ or {. The exprAllowed are now only maintained in the JSX plugin, and used for disambiguate 1) relational expression < and JSX Tag start <jsx>; 2) object literal ({name}) and JSX text.

I will mark this PR ready for review when refactoring exprAllowed for the third usage is done.

@JLHwung JLHwung added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: parser labels Jun 7, 2021
@babel-bot
Copy link
Collaborator

babel-bot commented Jun 7, 2021

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/46718/

@codesandbox-ci
Copy link

codesandbox-ci bot commented Jun 7, 2021

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit f2434cb:

Sandbox Source
babel-repl-custom-plugin Configuration
babel-plugin-multi-config Configuration

Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how this simplifies the code.

@@ -140,6 +140,7 @@ export const types: { [name: string]: TokenType } = {

eq: new TokenType("=", { beforeExpr, isAssign }),
assign: new TokenType("_=", { beforeExpr, isAssign }),
slashAssign: new TokenType("_=", { beforeExpr, isAssign }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
slashAssign: new TokenType("_=", { beforeExpr, isAssign }),
slashAssign: new TokenType("/=", { beforeExpr, isAssign }),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token type label is visible when tokens: true is enabled so it may breaks if people are depending on token.label since /= was parsed as tt.assign whose label is /=. We can modify the label in Babel 8.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wrap it in a new class and use the 'new' keyword. Did you run a benchmark test on that?

Copy link
Contributor Author

@JLHwung JLHwung Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token is initiated once and reused elsewhere. I am aware that using a binary packed number is more optimal because it eliminates the memory loading instructions compiled from tt.slashAssign. Also note that the NewExpression here is a one time cost, we reuse the token object defined here in finishToken.

However this PR is refactoring, it only tries to not degrade the performance so reviewers can be focused on the code architecture changes. We will land performance improvement in a separate PR.

case tt.bracketR:
case tt.braceBarR:
case tt.colon:
case tt.comma:
Copy link
Contributor Author

@JLHwung JLHwung Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original approach checks tt.semi and then excludes tokens with type.startsExpr. I think the new approach easier to reason about since it is an allowlist. This approach is taken from V8:

https://source.chromium.org/chromium/chromium/src/+/main:v8/src/parsing/parser-base.h;l=2979;drc=d17745f350d4956a07cb1113ee19e9cbc4be699f;bpv=1;bpt=1

However, V8 has tt._in here, which I think it is redundant because it seems to me that in never follow an AssignmentExpression. The spec has

[+In] RelationalExpression[?In, ?Yield, ?Await] in ShiftExpression
for ( LeftHandSideExpression in Expression ) Statement
for ( var ForBinding in Expression ) Statement
for ( ForDeclaration in Expression ) Statement

None of the productions before in can parsed down to an argument-less YieldExpression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's because of this?

function* fn() {
  a ? yield : 2
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I meant for tt._in. 🤦

this.state.exprAllowed = false;
};

tt.jsxTagEnd.updateContext = function (prevType) {
Copy link
Contributor Author

@JLHwung JLHwung Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is merged with updateContext because it manipulates this.state.exprAllowed based on different contexts.

@@ -0,0 +1,2 @@
function *f1() { yield / 1 /g }
function *f2() { yield /=2 /i }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't already have it, can you add this test? (with normal functions)

function f1() { yield / 1 /g }
function f2() { yield /=2 /i }

@JLHwung JLHwung marked this pull request as ready for review June 8, 2021 14:48
// regular expression).
// The `beforeExpr` property is used to disambiguate between 1) binary
// expression (<) and JSX Tag start (<name>); 2) object literal and JSX
// texts. It is set on the `updateContext` function in the JSX plugin.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably further simplify the exprAllowed logic in updateContext of JSX plugin, based on the usage here. However I would like to land it separately before this PR is too big for review.

node.argument = this.parseMaybeAssign();
let delegating = false;
let argument = null;
if (!this.hasPrecedingLineBreak()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about why

function* fn() {
  yield
  * []
}

is disallowed, it doesn't seem to be ambiguous 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I have no idea why it is disallowed, maybe @bakkot can shred some light here?

Copy link
Contributor

@bakkot bakkot Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That decision was before my time, so I can only speculate. I agree it would still be unambiguous without the NLTH restriction. My best guess is that it's future-proofing, to reserve the possibility of introducing * as a prefix operator.

} else if (type === tt.jsxTagEnd) {
const out = context.pop();
if ((out === tc.j_oTag && prevType === tt.slash) || out === tc.j_cTag) {
context.pop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we popping out here? j_expr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If out is j_oTag (<name></name>), context.pop is j_expr

If out is j_cTag (<name />), context.pop is the context before j_cTag, which is also the one before j_expr since we reinterpret j_expr, j_oTag as j_cTag when we see slash following jsxTagStart.

The logic here is was tt.jsxTagEnd.updateContext, I move it here so we can simplify the updateContext interface.

@KFlash
Copy link

KFlash commented Jun 9, 2021

How much memory does it consume to parse this? If I try to parse this on an Intel 386 sx / dx 16 mhz cpu, I will have no problems?

Copy link
Member

@existentialism existentialism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@nicolo-ribaudo nicolo-ribaudo merged commit b9c1884 into babel:main Jun 9, 2021
@nicolo-ribaudo nicolo-ribaudo deleted the reduce-exprAllowed-usage branch June 9, 2021 14:36
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Sep 9, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants