Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve template tokenizing #13919

Merged

Conversation

JLHwung
Copy link
Contributor

@JLHwung JLHwung commented Nov 2, 2021

Q                       A
Major: Breaking Change? Yes, behind BABEL_8_BREAKING flag
Tests Added + Pass? Yes
Documentation PR Link babel/website#2765
License MIT

This PR includes commits from #13891.

Let's start by an example. Currently Babel tokenizes a string template

`before${x}middle${y}after`

into a token sequences

[ backQuote, template, dollarBraceL, name, braceR, template, dollarBraceL, name, braceR, template, backQuote ]

in order to check whether a backQuote ` starts a template and whether braceR } starts a template span, we have to introduce a context stack for backQuote and braceR. Unlike backQuote, a braceR can pair with more than one tokens: namely braceL {, dollarBraceL ${ and braceHashL #{. Since we lost track of what the braceR is paring with when tokenzing, we have to push new context for braceL and braceHashL even if they have no special semantics in string template. Given that braceL { is a common token in JavaScript, this approach introduced overhead that could have been avoided.

In this PR we merge braceR and dollarBraceL with the template token. Two new template token types are introduced in order to differentiate between template middle ...${ (ends with ${) and template tail ...` (ends with `). So now the example above is tokenized into

[ templateMiddle, name, templateMiddle, name, templateTail ]

The new token layout is terser, requires no context tracking and can still work with the parser. So we can remove updateContext in the base parser and also simplify the JSX element which still depends on updateContext.

Another bonus is that the new token layout is almost identical to the espree one. So we can also remove some works in @babel/eslint-parser: In Babel 8 we don't need to merge multiple tokens into a template token.

Benchmark Results

many-template-elements (15% faster)
baseline 128 template elements: 8_172 ops/sec ±44.79% (0.122ms)
baseline 256 template elements: 5_256 ops/sec ±0.77% (0.19ms)
baseline 512 template elements: 2_660 ops/sec ±0.73% (0.376ms)
baseline 1024 template elements: 1_278 ops/sec ±1.59% (0.783ms)
current 128 template elements: 9_080 ops/sec ±44.47% (0.11ms)
current 256 template elements: 6_025 ops/sec ±0.85% (0.166ms)
current 512 template elements: 3_053 ops/sec ±0.27% (0.328ms)
current 1024 template elements: 1_442 ops/sec ±1.5% (0.693ms)
many-nested-template-elements (25% faster)
baseline 128 nested template elements: 7_857 ops/sec ±41.2% (0.127ms)
baseline 256 nested template elements: 4_737 ops/sec ±2.01% (0.211ms)
baseline 512 nested template elements: 2_247 ops/sec ±1.84% (0.445ms)
baseline 1024 nested template elements: 1_016 ops/sec ±1.87% (0.985ms)
current 128 nested template elements: 10_305 ops/sec ±31.59% (0.097ms)
current 256 nested template elements: 5_999 ops/sec ±1.81% (0.167ms)
current 512 nested template elements: 2_919 ops/sec ±0.98% (0.343ms)
current 1024 nested template elements: 1_331 ops/sec ±2.32% (0.751ms)

@JLHwung JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories babel-8-dev → main labels Nov 2, 2021
@babel-bot
Copy link
Collaborator

babel-bot commented Nov 2, 2021

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/50065/

@JLHwung JLHwung marked this pull request as draft November 2, 2021 16:03
@JLHwung JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from defe5fa to 96712cd Compare November 2, 2021 19:35
@codesandbox-ci
Copy link

codesandbox-ci bot commented Nov 2, 2021

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit ce1440f:

Sandbox Source
babel-repl-custom-plugin Configuration
babel-plugin-multi-config Configuration

@JLHwung JLHwung marked this pull request as ready for review November 2, 2021 20:50
const elemStart = start + 1;
const elem = this.startNodeAt(
elemStart,
createPositionFromPosition(this.state.startLoc, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be more clear if this function was called something like createPositionAfterOffset or getIncrementedPosition

Copy link
Contributor Author

@JLHwung JLHwung Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe createPositionWithColumnOffset? I should add a note that this function does not consider whitespaces, so it does not work for any offsets for sure, and thus it should be only used when we create AST node inside the token boundaries, which should be rare scenario.

For this reason, I am also tempted to expand TemplateElement to match the actual token range in Babel 8. It can possibly further get rid of

if (node.type === "TemplateLiteral") {
for (let i = 0; i < node.quasis.length; i++) {
const q = node.quasis[i];
q.range[0] -= 1;
if (q.tail) {
q.range[1] += 1;
} else {
q.range[1] += 2;
}
q.loc.start.column -= 1;
if (q.tail) {
q.loc.end.column += 1;
} else {
q.loc.end.column += 2;
}
if (ESLINT_VERSION >= 8) {
q.start -= 1;
if (q.tail) {
q.end += 1;
} else {
q.end += 2;
}
}
}
}
},
};

@JLHwung JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from 2f02b9a to ce1440f Compare November 4, 2021 17:58
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly left some comments about naming (because I found that they made it harder to understand the code), but apart from that this PR looks good!

@@ -158,6 +158,8 @@ export const tt: { [name: string]: TokenType } = {
ellipsis: createToken("...", { beforeExpr }),
backQuote: createToken("`", { startsExpr }),
dollarBraceL: createToken("${", { beforeExpr, startsExpr }),
templateTail: createToken("...`", { startsExpr }),
templateMiddle: createToken("...${", { beforeExpr, startsExpr }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing, I found it quite confusing that templateMiddle can also be at the start of a template (I expected templateStart ŧemplateMiddle+ templateTail). I'd prefer if it was just called templateNonTail.

@@ -707,7 +707,7 @@ export default class ExpressionParser extends LValParser {
): N.Expression {
if (!noCalls && this.eat(tt.doubleColon)) {
return this.parseBind(base, startPos, startLoc, noCalls, state);
} else if (this.match(tt.backQuote)) {
} else if (this.match(tt.templateMiddle) || this.match(tt.templateTail)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since there are different places where we check this.match(tt.templateMiddle) || this.match(tt.templateTail), we could add a tokenIsTemplate function.

@@ -96,6 +96,81 @@ function babel7CompatTokens(tokens) {
continue;
}
}
if (type === tt.templateMiddle || type === tt.templateTail) {
if (!process.env.BABEL_8_BREAKING) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can swap these two ifs, since the outer one would be empty with Babel 8.

@@ -1377,34 +1371,38 @@ export default class Tokenizer extends ParserErrors {

// Reads template string tokens.

readTemplateMiddle(): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return both a templateMiddle or templateTail? Maybe readTemplateAfterSubsitution, or readTemplateContinuation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer readTemplateContinuation because it is shorter. Will also rename readTmplToken to readTemplateToken for consistency.

@JLHwung JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from ce1440f to 1859669 Compare November 19, 2021 16:53
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linting failure looks related.

@JLHwung
Copy link
Contributor Author

JLHwung commented Dec 1, 2021

Oh the lint error should be related. I almost forgot this PR, I will fix after rebase.

@JLHwung JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from 660c848 to ad60c64 Compare December 1, 2021 02:59
@JLHwung JLHwung merged commit 94af0e5 into babel:main Dec 6, 2021
@JLHwung JLHwung deleted the merge-backquote-dollarbracel-template-token branch December 6, 2021 21:43
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Mar 8, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 8, 2022
@nicolo-ribaudo nicolo-ribaudo added this to the v8.0.0 milestone Aug 8, 2023
@JLHwung JLHwung added PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release and removed babel 8 labels Aug 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants