New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve template tokenizing #13919
Improve template tokenizing #13919
Conversation
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/50065/ |
defe5fa
to
96712cd
Compare
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. Latest deployment of this branch, based on commit ce1440f:
|
const elemStart = start + 1; | ||
const elem = this.startNodeAt( | ||
elemStart, | ||
createPositionFromPosition(this.state.startLoc, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it would be more clear if this function was called something like createPositionAfterOffset
or getIncrementedPosition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe createPositionWithColumnOffset
? I should add a note that this function does not consider whitespaces, so it does not work for any offsets for sure, and thus it should be only used when we create AST node inside the token boundaries, which should be rare scenario.
For this reason, I am also tempted to expand TemplateElement
to match the actual token range in Babel 8. It can possibly further get rid of
babel/eslint/babel-eslint-parser/src/convert/convertAST.cjs
Lines 79 to 106 in 531db5d
if (node.type === "TemplateLiteral") { | |
for (let i = 0; i < node.quasis.length; i++) { | |
const q = node.quasis[i]; | |
q.range[0] -= 1; | |
if (q.tail) { | |
q.range[1] += 1; | |
} else { | |
q.range[1] += 2; | |
} | |
q.loc.start.column -= 1; | |
if (q.tail) { | |
q.loc.end.column += 1; | |
} else { | |
q.loc.end.column += 2; | |
} | |
if (ESLINT_VERSION >= 8) { | |
q.start -= 1; | |
if (q.tail) { | |
q.end += 1; | |
} else { | |
q.end += 2; | |
} | |
} | |
} | |
} | |
}, | |
}; |
2f02b9a
to
ce1440f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly left some comments about naming (because I found that they made it harder to understand the code), but apart from that this PR looks good!
@@ -158,6 +158,8 @@ export const tt: { [name: string]: TokenType } = { | |||
ellipsis: createToken("...", { beforeExpr }), | |||
backQuote: createToken("`", { startsExpr }), | |||
dollarBraceL: createToken("${", { beforeExpr, startsExpr }), | |||
templateTail: createToken("...`", { startsExpr }), | |||
templateMiddle: createToken("...${", { beforeExpr, startsExpr }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reviewing, I found it quite confusing that templateMiddle
can also be at the start of a template (I expected templateStart
ŧemplateMiddle
+ templateTail
). I'd prefer if it was just called templateNonTail
.
@@ -707,7 +707,7 @@ export default class ExpressionParser extends LValParser { | |||
): N.Expression { | |||
if (!noCalls && this.eat(tt.doubleColon)) { | |||
return this.parseBind(base, startPos, startLoc, noCalls, state); | |||
} else if (this.match(tt.backQuote)) { | |||
} else if (this.match(tt.templateMiddle) || this.match(tt.templateTail)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since there are different places where we check this.match(tt.templateMiddle) || this.match(tt.templateTail)
, we could add a tokenIsTemplate
function.
@@ -96,6 +96,81 @@ function babel7CompatTokens(tokens) { | |||
continue; | |||
} | |||
} | |||
if (type === tt.templateMiddle || type === tt.templateTail) { | |||
if (!process.env.BABEL_8_BREAKING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can swap these two if
s, since the outer one would be empty with Babel 8.
@@ -1377,34 +1371,38 @@ export default class Tokenizer extends ParserErrors { | |||
|
|||
// Reads template string tokens. | |||
|
|||
readTemplateMiddle(): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can return both a templateMiddle
or templateTail
? Maybe readTemplateAfterSubsitution
, or readTemplateContinuation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer readTemplateContinuation
because it is shorter. Will also rename readTmplToken
to readTemplateToken
for consistency.
ce1440f
to
1859669
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linting failure looks related.
Oh the lint error should be related. I almost forgot this PR, I will fix after rebase. |
660c848
to
ad60c64
Compare
BABEL_8_BREAKING
flagThis PR includes commits from #13891.
Let's start by an example. Currently Babel tokenizes a string template
`before${x}middle${y}after`
into a token sequences
in order to check whether a backQuote
`
starts a template and whether braceR}
starts a template span, we have to introduce a context stack for backQuote and braceR. Unlike backQuote, a braceR can pair with more than one tokens: namely braceL{
, dollarBraceL${
and braceHashL#{
. Since we lost track of what the braceR is paring with when tokenzing, we have to push new context for braceL and braceHashL even if they have no special semantics in string template. Given that braceL{
is a common token in JavaScript, this approach introduced overhead that could have been avoided.In this PR we merge braceR and dollarBraceL with the template token. Two new template token types are introduced in order to differentiate between template middle
...${
(ends with${
) and template tail...`
(ends with`
). So now the example above is tokenized intoThe new token layout is terser, requires no context tracking and can still work with the parser. So we can remove
updateContext
in the base parser and also simplify the JSX element which still depends onupdateContext
.Another bonus is that the new token layout is almost identical to the
espree
one. So we can also remove some works in@babel/eslint-parser
: In Babel 8 we don't need to merge multiple tokens into a template token.Benchmark Results
many-template-elements (15% faster)
many-nested-template-elements (25% faster)