Improve template tokenizing #13919

JLHwung · 2021-11-02T15:48:09Z

Q	A
Major: Breaking Change?	Yes, behind `BABEL_8_BREAKING` flag
Tests Added + Pass?	Yes
Documentation PR Link	babel/website#2765
License	MIT

This PR includes commits from #13891.

Let's start by an example. Currently Babel tokenizes a string template

`before${x}middle${y}after`

into a token sequences

[ backQuote, template, dollarBraceL, name, braceR, template, dollarBraceL, name, braceR, template, backQuote ]

in order to check whether a backQuote ` starts a template and whether braceR } starts a template span, we have to introduce a context stack for backQuote and braceR. Unlike backQuote, a braceR can pair with more than one tokens: namely braceL {, dollarBraceL ${ and braceHashL #{. Since we lost track of what the braceR is paring with when tokenzing, we have to push new context for braceL and braceHashL even if they have no special semantics in string template. Given that braceL { is a common token in JavaScript, this approach introduced overhead that could have been avoided.

In this PR we merge braceR and dollarBraceL with the template token. Two new template token types are introduced in order to differentiate between template middle ...${ (ends with ${) and template tail ...` (ends with `). So now the example above is tokenized into

[ templateMiddle, name, templateMiddle, name, templateTail ]

The new token layout is terser, requires no context tracking and can still work with the parser. So we can remove updateContext in the base parser and also simplify the JSX element which still depends on updateContext.

Another bonus is that the new token layout is almost identical to the espree one. So we can also remove some works in @babel/eslint-parser: In Babel 8 we don't need to merge multiple tokens into a template token.

Benchmark Results

many-template-elements (15% faster)

baseline 128 template elements: 8_172 ops/sec ±44.79% (0.122ms)
baseline 256 template elements: 5_256 ops/sec ±0.77% (0.19ms)
baseline 512 template elements: 2_660 ops/sec ±0.73% (0.376ms)
baseline 1024 template elements: 1_278 ops/sec ±1.59% (0.783ms)
current 128 template elements: 9_080 ops/sec ±44.47% (0.11ms)
current 256 template elements: 6_025 ops/sec ±0.85% (0.166ms)
current 512 template elements: 3_053 ops/sec ±0.27% (0.328ms)
current 1024 template elements: 1_442 ops/sec ±1.5% (0.693ms)

many-nested-template-elements (25% faster)

baseline 128 nested template elements: 7_857 ops/sec ±41.2% (0.127ms)
baseline 256 nested template elements: 4_737 ops/sec ±2.01% (0.211ms)
baseline 512 nested template elements: 2_247 ops/sec ±1.84% (0.445ms)
baseline 1024 nested template elements: 1_016 ops/sec ±1.87% (0.985ms)
current 128 nested template elements: 10_305 ops/sec ±31.59% (0.097ms)
current 256 nested template elements: 5_999 ops/sec ±1.81% (0.167ms)
current 512 nested template elements: 2_919 ops/sec ±0.98% (0.343ms)
current 1024 nested template elements: 1_331 ops/sec ±2.32% (0.751ms)

babel-bot · 2021-11-02T15:50:39Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/50065/

codesandbox-ci · 2021-11-02T19:38:24Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit ce1440f:

Sandbox	Source
babel-repl-custom-plugin	Configuration
babel-plugin-multi-config	Configuration

nicolo-ribaudo · 2021-11-04T00:05:17Z

packages/babel-parser/src/parser/expression.js

+    const elemStart = start + 1;
+    const elem = this.startNodeAt(
+      elemStart,
+      createPositionFromPosition(this.state.startLoc, 1),


Nit: it would be more clear if this function was called something like createPositionAfterOffset or getIncrementedPosition

Maybe createPositionWithColumnOffset? I should add a note that this function does not consider whitespaces, so it does not work for any offsets for sure, and thus it should be only used when we create AST node inside the token boundaries, which should be rare scenario.

For this reason, I am also tempted to expand TemplateElement to match the actual token range in Babel 8. It can possibly further get rid of

babel/eslint/babel-eslint-parser/src/convert/convertAST.cjs

Lines 79 to 106 in 531db5d

if (node.type === "TemplateLiteral") {

for (let i = 0; i < node.quasis.length; i++) {

const q = node.quasis[i];

q.range[0] -= 1;

if (q.tail) {

q.range[1] += 1;

} else {

q.range[1] += 2;

}

q.loc.start.column -= 1;

if (q.tail) {

q.loc.end.column += 1;

} else {

q.loc.end.column += 2;

}

if (ESLINT_VERSION >= 8) {

q.start -= 1;

if (q.tail) {

q.end += 1;

} else {

q.end += 2;

}

}

}

}

},

};

nicolo-ribaudo

I mostly left some comments about naming (because I found that they made it harder to understand the code), but apart from that this PR looks good!

nicolo-ribaudo · 2021-11-19T00:17:14Z

packages/babel-parser/src/tokenizer/types.js

@@ -158,6 +158,8 @@ export const tt: { [name: string]: TokenType } = {
  ellipsis: createToken("...", { beforeExpr }),
  backQuote: createToken("`", { startsExpr }),
  dollarBraceL: createToken("${", { beforeExpr, startsExpr }),
+  templateTail: createToken("...`", { startsExpr }),
+  templateMiddle: createToken("...${", { beforeExpr, startsExpr }),


While reviewing, I found it quite confusing that templateMiddle can also be at the start of a template (I expected templateStart ŧemplateMiddle+ templateTail). I'd prefer if it was just called templateNonTail.

nicolo-ribaudo · 2021-11-19T00:18:55Z

packages/babel-parser/src/parser/expression.js

@@ -707,7 +707,7 @@ export default class ExpressionParser extends LValParser {
  ): N.Expression {
    if (!noCalls && this.eat(tt.doubleColon)) {
      return this.parseBind(base, startPos, startLoc, noCalls, state);
-    } else if (this.match(tt.backQuote)) {
+    } else if (this.match(tt.templateMiddle) || this.match(tt.templateTail)) {


Nit: since there are different places where we check this.match(tt.templateMiddle) || this.match(tt.templateTail), we could add a tokenIsTemplate function.

nicolo-ribaudo · 2021-11-19T00:21:03Z

packages/babel-parser/src/parser/statement.js

@@ -96,6 +96,81 @@ function babel7CompatTokens(tokens) {
        continue;
      }
    }
+    if (type === tt.templateMiddle || type === tt.templateTail) {
+      if (!process.env.BABEL_8_BREAKING) {


I think we can swap these two ifs, since the outer one would be empty with Babel 8.

nicolo-ribaudo · 2021-11-19T00:22:29Z

packages/babel-parser/src/tokenizer/index.js

@@ -1377,34 +1371,38 @@ export default class Tokenizer extends ParserErrors {

  // Reads template string tokens.

+  readTemplateMiddle(): void {


This can return both a templateMiddle or templateTail? Maybe readTemplateAfterSubsitution, or readTemplateContinuation?

I slightly prefer readTemplateContinuation because it is shorter. Will also rename readTmplToken to readTemplateToken for consistency.

nicolo-ribaudo

The linting failure looks related.

JLHwung · 2021-12-01T02:46:06Z

Oh the lint error should be related. I almost forgot this PR, I will fix after rebase.

JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories babel-8-dev → main labels Nov 2, 2021

JLHwung marked this pull request as draft November 2, 2021 16:03

JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from defe5fa to 96712cd Compare November 2, 2021 19:35

JLHwung marked this pull request as ready for review November 2, 2021 20:50

nicolo-ribaudo reviewed Nov 4, 2021

View reviewed changes

JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from 2f02b9a to ce1440f Compare November 4, 2021 17:58

nicolo-ribaudo reviewed Nov 19, 2021

View reviewed changes

JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from ce1440f to 1859669 Compare November 19, 2021 16:53

nicolo-ribaudo approved these changes Nov 30, 2021

View reviewed changes

JLHwung added 14 commits November 30, 2021 21:46

add benchmarks

ffe748c

refactor: tokenize template as middle + tail

d936439

perf: avoid push tc.brace

0bd2a0a

refactor: overwrite skipSpace in jsx plugin

7eccd3d

transform tl.templateMiddle/Tail

3c84e0f

refactor: simplify JSX context tracking

1eaaea1

fix flow error

45c829b

refactor: move JSX context to context.js

b048b9b

fix: ensure comment stack is correctly handled

a585497

rename createPositionFromPosition

eb3318c

rename token type and methods

eac7b91

add tokenIsTemplate

e5a4ea9

refactor: merge babel 7 logic in babel7CompatTokens

5a67fc0

fix flow error

ad60c64

JLHwung force-pushed the merge-backquote-dollarbracel-template-token branch from 660c848 to ad60c64 Compare December 1, 2021 02:59

existentialism approved these changes Dec 6, 2021

View reviewed changes

JLHwung merged commit 94af0e5 into babel:main Dec 6, 2021

JLHwung deleted the merge-backquote-dollarbracel-template-token branch December 6, 2021 21:43

fisker mentioned this pull request Dec 14, 2021

[Bug]: Unexpected error thrown when parsing broken template literal #14054

Closed

1 task

JLHwung mentioned this pull request Dec 14, 2021

fix: handle tokens for invalid template element #14055

Merged

snyk-bot mentioned this pull request Feb 22, 2022

[Snyk] Upgrade: @babel/core, @babel/preset-env dfds/tika#39

Closed

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Mar 8, 2022

github-actions bot locked as resolved and limited conversation to collaborators Mar 8, 2022

nicolo-ribaudo added babel 8 and removed babel-8-dev → main labels Aug 8, 2023

nicolo-ribaudo added this to the v8.0.0 milestone Aug 8, 2023

JLHwung added PR: Breaking Change 💥 A type of pull request used for our changelog categories for next major release and removed babel 8 labels Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve template tokenizing #13919

Improve template tokenizing #13919

JLHwung commented Nov 2, 2021 •

edited

babel-bot commented Nov 2, 2021 •

edited

codesandbox-ci bot commented Nov 2, 2021 •

edited

nicolo-ribaudo Nov 4, 2021 •

edited

JLHwung Nov 4, 2021 •

edited

nicolo-ribaudo left a comment

nicolo-ribaudo Nov 19, 2021

nicolo-ribaudo Nov 19, 2021

nicolo-ribaudo Nov 19, 2021

nicolo-ribaudo Nov 19, 2021

JLHwung Nov 19, 2021

nicolo-ribaudo left a comment

JLHwung commented Dec 1, 2021

	if (node.type === "TemplateLiteral") {
	for (let i = 0; i < node.quasis.length; i++) {
	const q = node.quasis[i];
	q.range[0] -= 1;
	if (q.tail) {
	q.range[1] += 1;
	} else {
	q.range[1] += 2;
	}
	q.loc.start.column -= 1;
	if (q.tail) {
	q.loc.end.column += 1;
	} else {
	q.loc.end.column += 2;
	}

	if (ESLINT_VERSION >= 8) {
	q.start -= 1;
	if (q.tail) {
	q.end += 1;
	} else {
	q.end += 2;
	}
	}
	}
	}
	},
	};

		@@ -1377,34 +1371,38 @@ export default class Tokenizer extends ParserErrors {

		// Reads template string tokens.

		readTemplateMiddle(): void {

Improve template tokenizing #13919

Improve template tokenizing #13919

Conversation

JLHwung commented Nov 2, 2021 • edited

Benchmark Results

babel-bot commented Nov 2, 2021 • edited

codesandbox-ci bot commented Nov 2, 2021 • edited

nicolo-ribaudo Nov 4, 2021 • edited

Choose a reason for hiding this comment

JLHwung Nov 4, 2021 • edited

Choose a reason for hiding this comment

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

nicolo-ribaudo Nov 19, 2021

Choose a reason for hiding this comment

nicolo-ribaudo Nov 19, 2021

Choose a reason for hiding this comment

nicolo-ribaudo Nov 19, 2021

Choose a reason for hiding this comment

nicolo-ribaudo Nov 19, 2021

Choose a reason for hiding this comment

JLHwung Nov 19, 2021

Choose a reason for hiding this comment

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

JLHwung commented Dec 1, 2021

JLHwung commented Nov 2, 2021 •

edited

babel-bot commented Nov 2, 2021 •

edited

codesandbox-ci bot commented Nov 2, 2021 •

edited

nicolo-ribaudo Nov 4, 2021 •

edited

JLHwung Nov 4, 2021 •

edited