New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(babel-parser): avoid state.clone() to clone the whole token store #11029
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this example:
const code = `
async < T > (x);
`;
const out = parser.parse(code, {
plugins: ["typescript"],
tokens: true,
});
console.log(JSON.stringify(out.tokens, null, 2));
All the tokens are now duplicated. There is a reason if tokens
is in state
and we are cloning arrays when cloning state
: it's to avoid having invalid tokens when we try to parse something, and then go back and try to parse something else.
You can check the first commit which introduced the tokens array: at that time it was even explicitly copied.
No new test was added, as existing tests covered the tokens parser option.
Could you please add a test with tokens and typescript, like the example I provided? 😅
@nicolo-ribaudo thx, I now see the duplicated tokens with my fix. I don't quite understand. From what I saw, the token store is only populated when There must be somewhere in typescript plugin to mutate the token store (only when it is not empty), but I cannot find relevant code. Can you point me to where the token store is touched by typescript plugin? Also I see only typescript and flow plugins are using Nevertheless, we need to find way to avoid cloning (shallowly currently) the huge token store array. This doesn't scale with input code size. |
@nicolo-ribaudo I see now what typescript plugin did. |
This is to give modify-code a chance to bypass a babel performance issue babel/babel#11029.
@nicolo-ribaudo pls review the added tests (output.json generated by latest master), and the tiny trick to maintain clean token store. |
Fixed the performance issue on large input when turned on option {tokens: true} and typescript plugin which uses quite a few state.clone().
The output.json is generated by old master to make sure no regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Out of curiosity, could you run the benchmark you posted in your PR description using this PR, to see how much it improved performance? |
Good idea!
|
I ran a rough benchmark for avg of 10 times.
There might be something else (my system load) to make the splice slight worse. I would say we leave the optimization out for no obvious benefit even for huge input (7.8m file). |
Fixed the performance issue on large input when turned on option {tokens: true} and typescript plugin which uses quite a few state.clone().
To reproduce the performance issue.
Create a nodejs project with following 2 files:
package.json
index.js
We parse the typescript package main file (7.8m) to demonstrate the performance issue.
After
npm i
, runnode index.js
On my machine:
To compare the other two results:
tokens: true
plugins: ['typescript'],
Why the huge difference?
Because babel-parser's typescript plugin uses
state.clone()
api quite few times. (flow plugin also uses the clone api)But
State
class has a token storestate.tokens
to track all parsed tokens. This means everytimestate.clone()
is called, it duplicates the huge token storestate.tokens
.This is not only a performance issue, it also causes memory spike.
On my mac, roughly read from activity monitor, the nodejs process goes up to around 800m vs 160m when tokens option is off).
The fix
This fix is very simple, move up the token store from State to Tokenizer itself, so that the tokens aren't cloned in copied states. A cleanup logic is added to deal with typescript/flow's try-catch parsing branches.
This fix restored the running time of the code sample above back to 2~3 seconds range.