New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: replace lookahead by lookaheadCharCode #10371
Changes from all commits
f15f98d
8244a7a
4f6dcc4
19690e8
31a05b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,7 @@ import { | |
isIdentifierStart, | ||
keywordRelationalOperator, | ||
} from "../util/identifier"; | ||
import { lineBreak, skipWhiteSpace } from "../util/whitespace"; | ||
import { lineBreak } from "../util/whitespace"; | ||
import * as charCodes from "charcodes"; | ||
import { | ||
BIND_CLASS, | ||
|
@@ -105,10 +105,7 @@ export default class StatementParser extends ExpressionParser { | |
if (!this.isContextual("let")) { | ||
return false; | ||
} | ||
skipWhiteSpace.lastIndex = this.state.pos; | ||
const skip = skipWhiteSpace.exec(this.input); | ||
// $FlowIgnore | ||
const next = this.state.pos + skip[0].length; | ||
const next = this.nextTokenStart(); | ||
const nextCh = this.input.charCodeAt(next); | ||
// For ambiguous cases, determine if a LexicalDeclaration (or only a | ||
// Statement) is allowed here. If context is not empty then only a Statement | ||
|
@@ -170,7 +167,7 @@ export default class StatementParser extends ExpressionParser { | |
case tt._for: | ||
return this.parseForStatement(node); | ||
case tt._function: | ||
if (this.lookahead().type === tt.dot) break; | ||
if (this.lookaheadCharCode() === charCodes.dot) break; | ||
if (context) { | ||
if (this.state.strict) { | ||
this.raise( | ||
|
@@ -223,8 +220,11 @@ export default class StatementParser extends ExpressionParser { | |
return this.parseEmptyStatement(node); | ||
case tt._export: | ||
case tt._import: { | ||
const nextToken = this.lookahead(); | ||
if (nextToken.type === tt.parenL || nextToken.type === tt.dot) { | ||
const nextTokenCharCode = this.lookaheadCharCode(); | ||
if ( | ||
nextTokenCharCode === charCodes.leftParenthesis || | ||
nextTokenCharCode === charCodes.dot | ||
) { | ||
break; | ||
} | ||
|
||
|
@@ -1738,11 +1738,11 @@ export default class StatementParser extends ExpressionParser { | |
maybeParseExportDeclaration(node: N.Node): boolean { | ||
if (this.shouldParseExportDeclaration()) { | ||
if (this.isContextual("async")) { | ||
const next = this.lookahead(); | ||
const next = this.nextTokenStart(); | ||
|
||
// export async; | ||
if (next.type !== tt._function) { | ||
this.unexpected(next.start, `Unexpected token, expected "function"`); | ||
if (!this.isUnparsedContextual(next, "function")) { | ||
this.unexpected(next, `Unexpected token, expected "function"`); | ||
} | ||
} | ||
|
||
|
@@ -1757,21 +1757,10 @@ export default class StatementParser extends ExpressionParser { | |
|
||
isAsyncFunction(): boolean { | ||
if (!this.isContextual("async")) return false; | ||
|
||
const { pos } = this.state; | ||
|
||
skipWhiteSpace.lastIndex = pos; | ||
const skip = skipWhiteSpace.exec(this.input); | ||
|
||
if (!skip || !skip.length) return false; | ||
|
||
const next = pos + skip[0].length; | ||
|
||
const next = this.nextTokenStart(); | ||
return ( | ||
!lineBreak.test(this.input.slice(pos, next)) && | ||
this.input.slice(next, next + 8) === "function" && | ||
(next + 8 === this.length || | ||
!isIdentifierChar(this.input.charCodeAt(next + 8))) | ||
!lineBreak.test(this.input.slice(this.state.pos, next)) && | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JLHwung You don't gain performance here. Try replace const { input, pos } = this;
const nextChar = input.slice(pos, next);
return ( (nextChar=== 0x10 || nextChar === 0x13 || (nextChar^ 0x2028) <= 1) &&
input.slice(next, next + 8) === "function" && (next + 8 === this.length ||
!isIdentifierChar(input.charCodeAt(next + 8))) Not ideal either, but an improvement :) Eventually you can use a table lookup There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part of revisions are not actually meant to improve performance. I refactor the similar codes into a sharing routine
It is a good idea given that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Btw. I couldn't get babel parser to run in my benchmark, but where do I find a benchmark with it online? And try run this benchmark and see how Meriyah does it vs Acorn. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I like this benchmark website! And yes! Meriyah is almost twice as fast (warm JIT) as acorn in our benchmark suites, while babel is only half of acorn, 😢.
I couldn't find the source of your benchmark website, if it is open sourced I can see if there anything I can help to get babel parser running.
AFAIK we don't have an online benchmark. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The source for the benchmark is located here and the website is in the root folder. Estimate 14 days hard work and you could replicate the Babel parser if you write it from scratch. Another 6 - 8 days to get all plug-ins working. And the Babel parser should perform same as Meriyah :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JLHwung Meriyah's REPL is located here in case of interest. Inspired by Babel's REPL because I found out that the REPL was loading the page very slow too :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JLHwung You mentioned table lookup table for identifier scanning. I would say that the V8 solution isn't as fast as it could be either, but what you can do is to use a table lookup for the token kinds. Then you know that keywords can only be lower letters. And that no keyword starts with letter 'u'. With this knowledge you can optimize the identifier scanning. I just implemented this in my own lexer refactoring - seen here I just mentioned it because you mentioned it first, and it could be a good optimization trick for Babel :) |
||
this.isUnparsedContextual(next, "function") | ||
); | ||
} | ||
|
||
|
@@ -1833,10 +1822,10 @@ export default class StatementParser extends ExpressionParser { | |
return false; | ||
} | ||
|
||
const lookahead = this.lookahead(); | ||
const next = this.nextTokenStart(); | ||
return ( | ||
lookahead.type === tt.comma || | ||
(lookahead.type === tt.name && lookahead.value === "from") | ||
this.input.charCodeAt(next) === charCodes.comma || | ||
this.isUnparsedContextual(next, "from") | ||
); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ import { | |
lineBreakG, | ||
isNewLine, | ||
isWhitespace, | ||
skipWhiteSpace, | ||
} from "../util/whitespace"; | ||
import State from "./state"; | ||
|
||
|
@@ -168,6 +169,18 @@ export default class Tokenizer extends LocationParser { | |
return curr; | ||
} | ||
|
||
nextTokenStart(): number { | ||
const thisTokEnd = this.state.pos; | ||
skipWhiteSpace.lastIndex = thisTokEnd; | ||
const skip = skipWhiteSpace.exec(this.input); | ||
// $FlowIgnore: The skipWhiteSpace ensures to match any string | ||
return thisTokEnd + skip[0].length; | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JLHwung In most cases this will not improve the performance. Using regex for this purpose may have opposite effect. I looked at the lexer code and ... well... I understand why you do this.., but you should only need to use current index like this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides whitespaces, the The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You shouldn't blindly trust jsperf, but a regex can be super fast if done in a single operation / task, but in the case of Babel parser this have multiple purposes and slow down. In the long run a |
||
lookaheadCharCode(): number { | ||
return this.input.charCodeAt(this.nextTokenStart()); | ||
} | ||
|
||
// Toggle strict mode. Re-reads the next number or string to please | ||
// pedantic tests (`"use strict"; 010;` should fail). | ||
|
||
|
@@ -267,13 +280,7 @@ export default class Tokenizer extends LocationParser { | |
const startLoc = this.state.curPosition(); | ||
let ch = this.input.charCodeAt((this.state.pos += startSkip)); | ||
if (this.state.pos < this.length) { | ||
while ( | ||
ch !== charCodes.lineFeed && | ||
ch !== charCodes.carriageReturn && | ||
ch !== charCodes.lineSeparator && | ||
ch !== charCodes.paragraphSeparator && | ||
++this.state.pos < this.length | ||
) { | ||
while (!isNewLine(ch) && ++this.state.pos < this.length) { | ||
ch = this.input.charCodeAt(this.state.pos); | ||
} | ||
} | ||
|
@@ -439,13 +446,7 @@ export default class Tokenizer extends LocationParser { | |
let ch = this.input.charCodeAt(this.state.pos); | ||
if (ch !== charCodes.exclamationMark) return false; | ||
|
||
while ( | ||
ch !== charCodes.lineFeed && | ||
ch !== charCodes.carriageReturn && | ||
ch !== charCodes.lineSeparator && | ||
ch !== charCodes.paragraphSeparator && | ||
++this.state.pos < this.length | ||
) { | ||
while (!isNewLine(ch) && ++this.state.pos < this.length) { | ||
ch = this.input.charCodeAt(this.state.pos); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the critical path as
function
keyword frequency is high.