Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster checkReservedWord #13386

Merged
merged 5 commits into from Jun 1, 2021
Merged

Conversation

JLHwung
Copy link
Contributor

@JLHwung JLHwung commented May 28, 2021

Q                       A
License MIT

This PR aims for improving the performance of checkReservedWord in common scenarios. It is based on the assumption that even words like enum, protected could be a valid identifier name, such usage is rare. So in order to improve performance, we offer fast path for common names, and fallback to the slow path when it will be likely to throw an error.

This PR also improves the parse scope utility functions: They are computationally equivalent and reflect my early work on tackling this issue.

Benchmark results

The baseline is c6bebfe, current is this PR.

Length-11 identifiers
baseline 64 length-11 identifiers: 28849 ops/sec ±1.09% (0.035ms)
baseline 128 length-11 identifiers: 14846 ops/sec ±1.06% (0.067ms)
baseline 256 length-11 identifiers: 7567 ops/sec ±0.5% (0.132ms)
baseline 512 length-11 identifiers: 3772 ops/sec ±1.01% (0.265ms)
current 64 length-11 identifiers: 30111 ops/sec ±0.38% (0.033ms)
current 128 length-11 identifiers: 15391 ops/sec ±0.58% (0.065ms)
current 256 length-11 identifiers: 7544 ops/sec ±2.2% (0.133ms)
current 512 length-11 identifiers: 3883 ops/sec ±0.43% (0.258ms)
Length-2 identifiers
baseline 64 length-2 identifiers: 30858 ops/sec ±0.94% (0.032ms)
baseline 128 length-2 identifiers: 16131 ops/sec ±0.23% (0.062ms)
baseline 256 length-2 identifiers: 8097 ops/sec ±0.16% (0.124ms)
baseline 512 length-2 identifiers: 3673 ops/sec ±2.19% (0.272ms)
baseline 1024 length-2 identifiers: 1952 ops/sec ±1.26% (0.512ms)
current 64 length-2 identifiers: 33006 ops/sec ±0.92% (0.03ms)
current 128 length-2 identifiers: 17058 ops/sec ±1% (0.059ms)
current 256 length-2 identifiers: 8522 ops/sec ±1.42% (0.117ms)
current 512 length-2 identifiers: 4301 ops/sec ±1.04% (0.232ms)
current 1024 length-2 identifiers: 2182 ops/sec ±0.26% (0.458ms)
Length-1 identifiers
baseline 64 length-1 identifiers: 34244 ops/sec ±0.72% (0.029ms)
baseline 128 length-1 identifiers: 16767 ops/sec ±1.99% (0.06ms)
baseline 256 length-1 identifiers: 7606 ops/sec ±2.63% (0.131ms)
baseline 512 length-1 identifiers: 4467 ops/sec ±0.53% (0.224ms)
baseline 1024 length-1 identifiers: 2183 ops/sec ±1.12% (0.458ms)
current 64 length-1 identifiers: 35132 ops/sec ±1.19% (0.028ms)
current 128 length-1 identifiers: 18390 ops/sec ±0.69% (0.054ms)
current 256 length-1 identifiers: 9260 ops/sec ±0.81% (0.108ms)
current 512 length-1 identifiers: 4500 ops/sec ±1.46% (0.222ms)
current 1024 length-1 identifiers: 2304 ops/sec ±0.83% (0.434ms)
await as identifier
baseline 64 await identifier: 26803 ops/sec ±1.36% (0.037ms)
baseline 128 await identifier: 14133 ops/sec ±0.9% (0.071ms)
baseline 256 await identifier: 7061 ops/sec ±1.09% (0.142ms)
baseline 512 await identifier: 3491 ops/sec ±2.81% (0.286ms)
current 64 await identifier: 26729 ops/sec ±0.74% (0.037ms)
current 128 await identifier: 13539 ops/sec ±1.15% (0.074ms)
current 256 await identifier: 6819 ops/sec ±1.2% (0.147ms)
current 512 await identifier: 3470 ops/sec ±0.49% (0.288ms)

As is expected, current is slower than the baseline when parsing valid await as identifier, because of the extra code path.

@JLHwung JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories labels May 28, 2021
@@ -2389,12 +2391,22 @@ export default class ExpressionParser extends LValParser {
checkKeywords: boolean,
isBinding: boolean,
): void {
if (this.prodParam.hasYield && word === "yield") {
this.raise(startLoc, Errors.YieldBindingIdentifier);
// All JavaScript reserved words are not longer than 10 characters.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The longest reserved word is instanceof and implements.

@babel-bot
Copy link
Collaborator

babel-bot commented May 28, 2021

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/46602/

@codesandbox-ci
Copy link

codesandbox-ci bot commented May 28, 2021

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 84be70f:

Sandbox Source
babel-repl-custom-plugin Configuration
babel-plugin-multi-config Configuration

]);

export function canBeReservedWord(word: string): boolean {
return reservedWordLikeSet.has(word);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid maintaining another reserved words list, could this be word === "await" || word === "enum" || isStrictBindReservedWord(word)?

Copy link

@KFlash KFlash May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of that long list, why not do something simple as x & ReservedWord ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Set#has hashes the input string and checks if the hash value is in the table (source), since the the input string is scanned only during hash key is computed and the collision is rare by definition of hash, checking presence takes 1 scan of input and O(1) of the set size.

The StringEquals first checks the length, then the memory address and then fallbacks to byte-by-byte comparison. (source) If we check word === "await" and then isStrictBindReservedWord(word), the word is read for 1.1 times (average on the universe distribution of string length), 1 for the hash computation and 1/10 for comparison with "await". word === "await" also introduces extra branch conditions.

To avoid maintaining another reserved words list

I agree it is not ideal to duplicate keywords, WDYT if I add canBeReservedWord to helper-validator-identifier in Babel 7.15? Since that will be a feature, I can draft a subseqent PR that moves canBeReservedWord to the validator package based on this one so we don't have to wait until 7.15.

why not do something simple as x & ReservedWord ?

@KFlash Can you elaborate? How can & achieves searching here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for deferring it to 7.15

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JLHwung It's possible if you make things complex and add some bitwise masks behind each reserved word token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify whether using a prototye-less object is faster? My intuition is that Sets are algorithmically fast, but the actual implementation in V8 is slow.

Copy link

@KFlash KFlash May 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jridgewell It's faster because I "pack" all reserved words into a set of bitwise masks. It also consumes less memory. You are only dealing with series of number. see Kataw for an example. Same you can do with tokens, keywords, and everything you need. No need for a lookup.

Witch one is fastest? Set and lookup or this one
(token & ReserwedWord) === Reservedword // return true if reserved word

or a switch statement with 60 cases and 100lines of code to get all expression nodes or this one?

(token & IsExpressionNode) === IsExpressionNode // return true if expression node

It's very obvious what's fastest, but I think it's out of scope for Babel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jridgewell I come up with a benchmark file comparing these three approaches: Set#has, Object brand checks and (switch length + string equals).

The benchmark result on Node.js 16.1.0:

object brand checks different string: 2059273 ops/sec ±0.42% (0ms)
object brand checks constant string: 31712533 ops/sec ±0.22% (0ms)
set#has different string: 17756450 ops/sec ±1.26% (0ms)
set#has constant string: 38050433 ops/sec ±1.54% (0ms)
string#equals different string: 13789367 ops/sec ±0.44% (0ms)
string#equals constant string: 709703804 ops/sec ±0.42% (0ms)

On different string cases, where each input is in different memory address, Set#has is 700% faster than prototype-less object brand check. On constant string cases, Set#has is 20% faster than prototype-less object brand check.

On different string cases, Set#has is 20% faster than (switch length + string equals). The latter significantly outperform Set#has when the input is a constant string, which is expected because V8 optimizes out the per-character equality branch of StringEquals. However in Babel parser, each identifier name have different address.

@KFlash Yes we can do a single bitwise check if we replace the data structure of token storage and create different token for different keywords. However Babel currently exposes its token storage via options.tokens so we have to maintain compatibility here.

}
// Most identifiers are not reservedWord-like, they don't need special
// treatments afterward, which very likely ends up throwing errors
if (!canBeReservedWord(word)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can move this check after the if yield/await/arguments checks, and just use isStrictBindReservedWord? Intuitively it shouldn't be slower, but I didn't check.

@JLHwung JLHwung merged commit cbad50a into babel:main Jun 1, 2021
@JLHwung JLHwung deleted the faster-check-reserved-words branch June 1, 2021 12:42
@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Sep 1, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants