fix(babel-parser): avoid state.clone() to clone the whole token store #11029

3cp · 2020-01-17T13:43:11Z

Q	A
Fixed Issues?	No
Patch: Bug Fix?	Yes
Major: Breaking Change?	No
Minor: New Feature?	No
Tests Added + Pass?	Yes
Documentation PR Link
Any Dependency Changes?	No
License	MIT

Fixed the performance issue on large input when turned on option {tokens: true} and typescript plugin which uses quite a few state.clone().

To reproduce the performance issue.

Create a nodejs project with following 2 files:

package.json

{
  "dependencies": {
    "@babel/parser": "latest",
    "typescript": "latest"
  }
}

index.js

var parser = require('@babel/parser');
var fs = require('fs');

var code = fs.readFileSync(require.resolve('typescript'), 'utf8');

var start = new Date().getTime();
var result = parser.parse(code, {
  plugins: ['typescript'],
  tokens: true
});
var spend = new Date().getTime() - start;
console.log('spend ' + (spend/1000).toFixed(2) + 's');
console.log('tokens length: ' + result.tokens.length);

We parse the typescript package main file (7.8m) to demonstrate the performance issue.
After npm i, run node index.js

On my machine:

spend 18.28s
tokens length: 897894

To compare the other two results:

remove tokens: true

spend 3.51s

or remove plugins: ['typescript'],

spend 2.70s
tokens length: 897894

Why the huge difference?

Because babel-parser's typescript plugin uses state.clone() api quite few times. (flow plugin also uses the clone api)
But State class has a token store state.tokens to track all parsed tokens. This means everytime state.clone() is called, it duplicates the huge token store state.tokens.

This is not only a performance issue, it also causes memory spike.
On my mac, roughly read from activity monitor, the nodejs process goes up to around 800m vs 160m when tokens option is off).

The fix

This fix is very simple, move up the token store from State to Tokenizer itself, so that the tokens aren't cloned in copied states. A cleanup logic is added to deal with typescript/flow's try-catch parsing branches.

This fix restored the running time of the code sample above back to 2~3 seconds range.

JLHwung

I like this PR!

nicolo-ribaudo

Consider this example:

const code = `
async < T > (x);
`;

const out = parser.parse(code, {
  plugins: ["typescript"],
  tokens: true,
});

console.log(JSON.stringify(out.tokens, null, 2));

All the tokens are now duplicated. There is a reason if tokens is in state and we are cloning arrays when cloning state: it's to avoid having invalid tokens when we try to parse something, and then go back and try to parse something else.

You can check the first commit which introduced the tokens array: at that time it was even explicitly copied.

No new test was added, as existing tests covered the tokens parser option.

Could you please add a test with tokens and typescript, like the example I provided? 😅

3cp · 2020-01-17T23:57:04Z

@nicolo-ribaudo thx, I now see the duplicated tokens with my fix.

I don't quite understand. From what I saw, the token store is only populated when options.tokens is set to true, I only see mutation happen in Tokenizer. It means for majority use cases where options.tokens is false, the token store is always empty. Parsing does not rely on it.

There must be somewhere in typescript plugin to mutate the token store (only when it is not empty), but I cannot find relevant code. Can you point me to where the token store is touched by typescript plugin?

Also I see only typescript and flow plugins are using tryParser() (it clones state too) and state.clone. I can avoid the two plugins to bypass this issue for now.

Nevertheless, we need to find way to avoid cloning (shallowly currently) the huge token store array. This doesn't scale with input code size.

3cp · 2020-01-18T00:17:21Z

@nicolo-ribaudo I see now what typescript plugin did.
It is not mutating the tokens directly, but keep the old state, try parse something, if it fails, restore state to old state, then try parse something else.
We can do better than this, but require bit of redesign to avoid expensive clone.

This is to give modify-code a chance to bypass a babel performance issue babel/babel#11029.

3cp · 2020-01-18T03:27:05Z

@nicolo-ribaudo pls review the added tests (output.json generated by latest master), and the tiny trick to maintain clean token store.

Fixed the performance issue on large input when turned on option {tokens: true} and typescript plugin which uses quite a few state.clone().

The output.json is generated by old master to make sure no regression.

…pt/flow plugins

packages/babel-parser/src/tokenizer/index.js

…tate

nicolo-ribaudo

Thanks!

nicolo-ribaudo · 2020-01-20T00:53:58Z

Out of curiosity, could you run the benchmark you posted in your PR description using this PR, to see how much it improved performance?

3cp · 2020-01-20T01:14:56Z

Good idea!

#latest 7.8.3
spend 17.06s
tokens length: 897894

#patched
spend 3.27s
tokens length: 897894

packages/babel-parser/src/tokenizer/index.js

3cp · 2020-01-20T03:31:47Z

I ran a rough benchmark for avg of 10 times.

.length = avg 2.55s
.splice avg 2.65s

There might be something else (my system load) to make the splice slight worse. I would say we leave the optimization out for no obvious benefit even for huge input (7.8m file).

JLHwung approved these changes Jan 17, 2020

View reviewed changes

JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories labels Jan 17, 2020

nicolo-ribaudo requested changes Jan 17, 2020

View reviewed changes

3cp changed the title ~~fix(babel-parser): avoid state.clone() to clone the whole token store~~ WIP fix(babel-parser): avoid state.clone() to clone the whole token store Jan 18, 2020

3cp added a commit to dumberjs/modify-code that referenced this pull request Jan 18, 2020

feat: support option noJsx and noTypeScript to skip those parsers

52018aa

This is to give modify-code a chance to bypass a babel performance issue babel/babel#11029.

3cp changed the title ~~WIP fix(babel-parser): avoid state.clone() to clone the whole token store~~ fix(babel-parser): avoid state.clone() to clone the whole token store Jan 18, 2020

3cp added 4 commits January 19, 2020 00:17

fix(babel-parser): avoid state.clone() to clone the whole token store

3d5b3fe

Fixed the performance issue on large input when turned on option {tokens: true} and typescript plugin which uses quite a few state.clone().

test(babel-parser): turn on 2 typescript tests with tokens:true

d7bf6e3

The output.json is generated by old master to make sure no regression.

fix(babel-parser): avoid duplicated tokens trapped by mainly typescri…

dfe6480

…pt/flow plugins

test(babel-parser): update output.json to latest master result

cb2449e

3cp requested a review from nicolo-ribaudo January 18, 2020 16:48

JLHwung self-requested a review January 19, 2020 04:02

nicolo-ribaudo reviewed Jan 19, 2020

View reviewed changes

packages/babel-parser/src/tokenizer/index.js Outdated Show resolved Hide resolved

chore(babel-parser): improve performance by storing tokensLength in s…

a2f8085

…tate

nicolo-ribaudo approved these changes Jan 20, 2020

View reviewed changes

JLHwung reviewed Jan 20, 2020

View reviewed changes

packages/babel-parser/src/tokenizer/index.js Show resolved Hide resolved

JLHwung merged commit 9bc04ba into babel:master Jan 20, 2020

commit-lint bot mentioned this pull request Jan 20, 2020

[pull] master from babel:master Berkmann18/babel#270

Merged

3cp deleted the fix-parser-state-clone branch January 20, 2020 03:43

snyk-bot mentioned this pull request Mar 7, 2020

[Snyk] Upgrade @babel/runtime from 7.8.4 to 7.8.7 ryanhefner/contentful-parsers#47

Merged

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Apr 20, 2020

github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(babel-parser): avoid state.clone() to clone the whole token store #11029

fix(babel-parser): avoid state.clone() to clone the whole token store #11029

3cp commented Jan 17, 2020 •

edited

JLHwung left a comment

nicolo-ribaudo left a comment •

edited

3cp commented Jan 17, 2020

3cp commented Jan 18, 2020

3cp commented Jan 18, 2020 •

edited

nicolo-ribaudo left a comment

nicolo-ribaudo commented Jan 20, 2020

3cp commented Jan 20, 2020 •

edited

3cp commented Jan 20, 2020

fix(babel-parser): avoid state.clone() to clone the whole token store #11029

fix(babel-parser): avoid state.clone() to clone the whole token store #11029

Conversation

3cp commented Jan 17, 2020 • edited

To reproduce the performance issue.

Why the huge difference?

The fix

JLHwung left a comment

Choose a reason for hiding this comment

nicolo-ribaudo left a comment • edited

Choose a reason for hiding this comment

3cp commented Jan 17, 2020

3cp commented Jan 18, 2020

3cp commented Jan 18, 2020 • edited

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Jan 20, 2020

3cp commented Jan 20, 2020 • edited

3cp commented Jan 20, 2020

3cp commented Jan 17, 2020 •

edited

nicolo-ribaudo left a comment •

edited

3cp commented Jan 18, 2020 •

edited

3cp commented Jan 20, 2020 •

edited