Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain TDZ violations in common scenarios and fix similar var bugs #4124

Closed
wants to merge 2 commits into from
Closed

Retain TDZ violations in common scenarios and fix similar var bugs #4124

wants to merge 2 commits into from

Conversation

kzc
Copy link
Contributor

@kzc kzc commented Jun 5, 2021

Retain TDZ violations in some common scenarios and fix similar var bugs.

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

List any relevant issue numbers:

Fixes most scenarios in #4101 except for bug1, which is a more complex multi-scope problem involving function calls. It will be addressed in a future PR.

Description

Rollup used to silently mask code related to TDZ violations for let and const variables, which would produce different runtime behavior without warning. This PR retains the TDZ violating code in many common scenarios to attempt to be bug compatible with the source input. Not coincidentally, the same TDZ logic also fixes a number of bugs related to using var variables before their declarations.

@github-actions
Copy link

github-actions bot commented Jun 5, 2021

Thank you for your contribution! ❤️

You can try out this pull request locally by installing Rollup via

npm install kzc/rollup#retain-common-tdz

or load it into the REPL:
https://rollupjs.org/repl/?pr=4124

@codecov
Copy link

codecov bot commented Jun 5, 2021

Codecov Report

Merging #4124 (f7a7f5c) into master (9f69fe3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4124   +/-   ##
=======================================
  Coverage   98.12%   98.12%           
=======================================
  Files         201      201           
  Lines        7077     7092   +15     
  Branches     2071     2081   +10     
=======================================
+ Hits         6944     6959   +15     
  Misses         64       64           
  Partials       69       69           
Impacted Files Coverage Δ
src/ast/nodes/Identifier.ts 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f69fe3...f7a7f5c. Read the comment docs.

@kzc
Copy link
Contributor Author

kzc commented Jun 5, 2021

Odd... codecov only ran on the first push.
Let's see if closing and reopening the PR does anything.

@kzc kzc closed this Jun 5, 2021
@kzc kzc reopened this Jun 5, 2021
console.log(function() {
if (x) return "HELLO"; // TDZ
const x = 1; // keep
return "WORLD"; // not reached
Copy link
Contributor Author

@kzc kzc Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although dropping thereturn "WORLD"; line for this test doesn't affect the outcome of the preserved TDZ violation above, I am not sure why it was dropped due to a side effect being reported on the Identifier x in if (x) return "HELLO";. If this example were changed to use var instead of const, the rollup result in this PR would be incorrect due to the dropped return "WORLD";:

Expected:

$ echo 'console.log(function(){if(x)return"HELLO";var x=1;return"WORLD"}());' | node
WORLD

Actual using this PR:

$ echo 'console.log(function(){if(x)return"HELLO";var x=1;return"WORLD"}());' | rollup --silent
console.log(function(){if(x)return "HELLO";var x=1;}());
$ echo 'console.log(function(){if(x)return"HELLO";var x=1;return"WORLD"}());' | rollup --silent | node
undefined

Note that rollup v2.50.6 is also wrong, but in a different way:

$ echo 'console.log(function(){if(x)return"HELLO";var x=1;return"WORLD"}());' | node_modules/.bin/rollup --silent
console.log(function(){return "HELLO";}());
$ echo 'console.log(function(){if(x)return"HELLO";var x=1;return"WORLD"}());' | node_modules/.bin/rollup --silent | node
HELLO

kzc added 2 commits June 5, 2021 21:22
* Fix analogous bugs with `var` variable use before declaration.
@kzc
Copy link
Contributor Author

kzc commented Jun 6, 2021

@lukastaegert This PR is good to merge. It fixes a number of bugs with no test regressions, and JS inputs that used to fail prior to this PR and continue to fail with this PR have been documented with "TODO" comments. The vast majority of rollup optimized output will not be pessimized by this PR.

Feel free to front-load and/or cache some of the checks performed in isPossibleTDZ in other AST classes if you want to make it slightly more efficient. Or if you prefer, rewrite the TDZ detection code altogether - as long as the tests pass.

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I am still not 100% sure about this one. Correctness is good, though correct TDZ violations are usually not as crucial as other things (though of course if people only ever execute code through Rollup, it becomes a meta language that deviates from how vanilla JavaScript runs, which is not desirable).
The good about your solution is that it is rather focused and does not deoptimize a lot. On the other hand, I find it really hard to read and understand, with extremely complicated conditionals. Also, using any casts in several places carries the additional danger that refactoring elsewhere in the code base breaks this code without warning (well, the tests could still catch it, but TypeScript definitely does not help).

I was actually considering a switch to deoptimize all possible TDZ violations (read: All const/let variables accesses unless we are 100% certain) that is off by default. Then one could start working the other way, try to identify situations where there is definitely no violation. Similar for var accesses.

In any case, we could keep this still until we can do better. But I would move the code into a separate helper.

@kzc
Copy link
Contributor Author

kzc commented Jun 11, 2021

The TDZ/var issue is essentially a variation of Turing's Halting problem. One cannot know if there is a problem unless the code is actually executed. But there are fairly simple heuristics that can be employed to predict where such errors are likely to occur.

Both TDZ masking and the incorrect use of var variables before their definitions are genuine bugs. The var examples were derived from uglify-js user bug reports. It just so happens the TDZ and var issues have the same semantics. The patch is relatively simple and relies on very specific AST patterns and the linear proximity of variable use relative to its declaration within the same scope using the character positions of AST nodes. Although crude, it is pretty accurate in finding variables that will cause rollup's tree shaking algorithm to fail without actually simulating running the code. There are 3 scenarios within the PR: variable use before definition in the same scope, use of an uninitialized variable in its own init, and specifically a let variable without an init accessed before declaration in the same scope. Even in the event of a wrong guess, the impact is negligible - it's just a minor deoptimization causing a little more code retention. But these wrong guesses are rare in practise as evidenced by having no regressions in the test suite.

In order to solve the remaining multiscope bugs involving function calls one would have to maintain a record of all captured variables within each function originating from a different scope. Then when getReturnExpressionWhenCalledAtPath or hasEffectsWhenCalledAtPath are invoked, one could compare the location of the caller to all captured variables within the same scope.

Despite the use of any in the patch, the code is resilient to change because it checks the type of every node involved before proceeding with its logic. It ought to be possible to rewrite the code to be in typescript's good graces and be slightly more efficient if each variable instance recorded (1) what type of variable it is, (2) had a reference to its declaration in addition to its init. Caching the TDZ result would also be useful, as it wouldn't change once determined.

I'm not a big fan of having another switch to deoptimize TDZ and vars, as there is already such a switch - --no-treeshake. The user would just use rollup as a bundler and optimize the code with a minifier that respects TDZ and var-use-before-declaration like uglify-js.

@kzc
Copy link
Contributor Author

kzc commented Jun 11, 2021

Closing out this PR as it won't be merged. Consider it a working proof of concept for what has to be done to fix these tree shaking bugs.

@kzc kzc closed this Jun 11, 2021
@lukastaegert
Copy link
Member

Well, technically Rollup is already doing a trial execution of the code, at least on a per function basis. That is how we handle removing code after errors or return statements. There is a context object to track state while doing that. If there were no function calls, the problem wouldn't actually be too difficult to solve. The question, though, is how to handle functions while avoiding a huge performance issue, as naively just following all function calls will make performance exponentially worse.
I consider the var issue as very serious, however I see a lot of difficult scenarios involving locally defined functions that are not easy to identify via heuristics. A good solution for that issue might in the end also provide a good solution for the TDZ issue, though.
Essentially I am currently wondering about ways to collect basic information about each function call that can tell us at the call site what variables are accessed. That way we would know if we have a TDZ access / access before definition. If we detect that, we can then deoptimize the variable (for var issues) or set it to some special value to indicated a possible error when accessed (for TDZ issues).
A switch, at least for the var issue, would still make a lot of tree-shaking possible. It would just prevent simplifying conditionals and deoptimize property accesses of those variables in the simplest version. Then when the algorithm is "good enough", it can be retired again.

@kzc
Copy link
Contributor Author

kzc commented Jun 12, 2021

Yes, the TDZ and var bugs are the same issue and would be solved by the same code. The only difference is that TDZ violation results in a runtime error. Rollup should strive to preserve variable use before definition semantics to retain fidelity with the original input code and the ES spec. Users just want their code to work as written without bothering with special flags to overcome deficiencies in the tree shaking algorithm. In my opinion the optimizer should be conservative and zero-config by default and known non-safe optimizations should be user opt-ins. Advanced users would be in a better position to know the risks of applying non-safe optimization flags.

I tend to err on the side of practicality. This PR handles the same-scope non-call cases automatically without a new treeshake option while still allowing optimizations for use-after-definition variables. The current implementation is small and self-contained. The code can easily be altered to avoid the use of any to appease typescript, while making it more efficient by front-loading the required variable decl information in other AST classes. It can be extended to handle the function call cases as I outlined above. However, function call support will necessarily involve extra computation for the captured variables. It's an unavoidable cost of correctness. Without fully executing the code you will never catch all the cases - a per function trial execution of code is still an incomplete heuristic by another name.

@lukastaegert
Copy link
Member

As I said

In any case, we could keep this still until we can do better. But I would move the code into a separate helper.

even though I do not like the solution, I am not against merging it. However as you already seem to have deleted your branch for reasons unknown to me, this will likely not happen, and there is not reason to discuss further unless you recreate the PR.

In any case, I will still implement the additional switch, as it is still much better than turning off tree-shaking.

@kzc
Copy link
Contributor Author

kzc commented Jun 13, 2021

I had interpreted that comment in its original context as keeping the PR for reference but not merging it. No matter - it was only a partial solution. The proof of concept served its purpose to convey some ideas. It was a good discussion. Thanks for your continuing great work on rollup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants