Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused pure functions and variables #639

Closed
devongovett opened this issue Jan 3, 2021 · 24 comments
Closed

Remove unused pure functions and variables #639

devongovett opened this issue Jan 3, 2021 · 24 comments

Comments

@devongovett
Copy link

Esbuild doesn't currently seem to remove unused pure functions or variables. For example this input:

(function() {
  var x = 2;

  function foo() {
    console.log('foo');
  }

  function bar() {
    console.log('bar');
  }

  foo();
})();

Currently produces this output:

(function(){var n=2;function o(){console.log("foo")}function c(){console.log("bar")}o()})();

Whereas terser produces:

console.log("foo");

That's obviously even more optimized in this case to get rid of the function declaration entirely, but I'd accept at least just removing the unused vars and functions. We're considering esbuild as an alternative minifier in Parcel, but we don't currently have our own DCE pass and rely on terser for this. This means that the output will typically be much larger than it is currently without either Parcel implementing its own DCE or esbuild doing so.

@evanw
Copy link
Owner

evanw commented Jan 3, 2021

As a shortcut, esbuild currently only removes unused functions and variables at the top level, since that's where it matters most in practice (especially with most code being written in ESM these days). This is because doing this dead code removal requires a call graph to remove unused transitive dependencies and esbuild only builds a call graph between top-level statements.

Here's what you get if you run the code inside the IIFE through esbuild with --minify --format=iife --bundle:

(()=>{function o(){console.log("foo")}o();})();

The --bundle is necessary to tell esbuild that it's seeing the whole file, and that it's safe to remove unused code. People have been using esbuild to compile partial files (e.g. #604) so it's not safe to assume that otherwise.

I assume the IIFE is being added by Parcel. Is that right? Does it work if esbuild is the one that adds the IIFE instead?

@devongovett
Copy link
Author

Ah. I had tried it without the IIFE but not with the --bundle flag.

One possible issue is that we sometimes produce bundles with imports/requires pointing to external modules or separate codesplit bundles. We wouldn't want ESBuild to end up bundling those. Perhaps we could pass these as externals to ESBuild but maybe a middle option to say "remove unused code, but don't bundle" would be nice.

@ggoodman
Copy link

ggoodman commented Jan 4, 2021

One possible issue is that we sometimes produce bundles with imports/requires pointing to external modules or separate codesplit bundles. We wouldn't want ESBuild to end up bundling those.

@devongovett I think you might find that you can solve for this problem at a pretty granular level by writing a simple onResolve plugin that marks such edges b/w modules as external: true.

@devongovett
Copy link
Author

Sounds slow...

@ggoodman
Copy link

ggoodman commented Jan 4, 2021

Sounds slow...

That's really a judgement call that you'll have to make for yourself. While it may erode some of the advantages esbuild would otherwise give speed-wise, the true impact is totally dependent on use-case.

That being said, I think I see your point. Being able to opt in on a DCE pass for .build() could be nifty. I think the struggle is finding the right set of knobs and opinions to expose so that different use-cases are supported in a forward-thinking way.

@evanw
Copy link
Owner

evanw commented Jan 4, 2021

I don't think this case actually needs --bundle since --format=iife also guarantees that esbuild isn't seeing a partial module. So it should be possible to enable DCE in this case in a backwards-compatible way without adding any additional options. This would be a nice additional feature. I will give adding this a shot at some point.

@evanw evanw closed this as completed in 4d32f7c Jan 6, 2021
@kzc
Copy link
Contributor

kzc commented Jan 7, 2021

4d32f7c doesn't resolve the issue in the top post. Is there a speed concern to perform DCE on anything but the top level?

@devongovett
Copy link
Author

Ran into this again today, even with the --format=iife option.

input:

function p() {
  var x = 2;
  var y = 3;
  return x;
}

console.log(p());

output:

(()=>{function o(){var n=2,r=3;return n}console.log(o());})();

Even if those variable declarations are hoisted out of the function:

var x, y;
function p() {
  x = 2;
  y = 3;
  return x;
}

console.log(p());

output:

(()=>{var n,o;function r(){return n=2,o=3,n}console.log(r());})();

Terser produces console.log(2) for both.

Is non-top level DCE on the roadmap? Should this issue be reopened, or should there be another issue for that? I would guess this would be a major case where terser produces smaller code. Would love to see esbuild optimize this too. We'd like to switch to esbuild as a minifier in parcel but it currently produces larger output due to this issue.

@kzc
Copy link
Contributor

kzc commented Mar 6, 2021

DCE at all levels would be a useful feature for esbuild, but the examples in #639 (comment) involve non-trivial function inlining and other optimizations before DCE takes place. The example in the top post is straight forward DCE...

$ cat toppost.js 
(function() {
  var x = 2;
  function foo() { console.log('foo'); }
  function bar() { console.log('bar'); }
  foo();
})();
$ cat toppost.js | uglify-js -bmc inline=0,reduce_vars=0,negate_iife=0
(function() {
    function o() {
        console.log("foo");
    }
    o();
})();

@evanw
Copy link
Owner

evanw commented Mar 6, 2021

Terser produces console.log(2) for both. Is non-top level DCE on the roadmap?

So AFAIK Terser is moving to Rust: https://twitter.com/fabiosantosart/status/1355304176104910849. Does anyone know how this is going? It seems like it should be very straightforward to port. Or maybe it'd be more straightforward to port it to Go if lifetimes or compile times are getting in the way, I'm not sure. Rust can be frustrating to use.

Anyway if a port of Terser to native code is even relatively imminent I don't think it makes a lot of sense to try to rebuild an equivalent minifier in native code from scratch. Doing what you are suggesting in esbuild is entirely possible but is a ton of work. I think it'd be best for esbuild to focus its "innovation energy" on the bundling aspect for now since it feels like that is the primitive that is most needed. And that should pair well with Terser once it has been ported to native code.

It could still make sense to build an advanced optimization pass into esbuild at some point in the future, but I'd like to direct development efforts on getting more of esbuild's end-to-end bundling story in place first. Code splitting, top-level await, CSS, and also maybe HTML are all more of a priority for me, especially since Terser is moving to native code.

@devongovett
Copy link
Author

Hmm ok, you may want to clarify this then since esbuild also advertises itself as a minifier. If minification isn't the priority and users should use esbuild in addition to another minifier, then this should be made clear in the documentation/readme. A bunch of tools are already trying to use esbuild as a minifier (e.g. there are webpack/rollup/parcel plugins).

So AFAIK Terser is moving to Rust

I'm also watching this PR in SWC, which appears to be a port of Terser: swc-project/swc#1302

@evanw
Copy link
Owner

evanw commented Mar 6, 2021

esbuild also advertises itself as a minifier

That's because esbuild is a minifier. And the minification is often within a 1-2% percent of Terser, so IMO it's reasonable to use esbuild as a minifier in many situations, especially given the speed improvement. These benchmarks demonstrate this:

esbuild Terser UglifyJS
d3 -31% -33% n/a
jquery -63% -64% -64%
lodash -73% -74% -75%
moment -47% -49% -50%
react -56% -57% -58%
terser -33% -34% n/a
three.js -35% -37% -37%
vue -49% -50% -50%

It's reasonable to use esbuild as a minifier instead of Terser even if the output is a little larger, just like it's reasonable to use Terser instead of UglifyJS even if the output is a little larger.

Minification in esbuild currently does these things:

  • Renaming identifiers to shorten them
  • Removing whitespace in between tokens
  • Compressing the AST using mostly peephole reductions from the leaves toward the root
  • Removing unused top-level statements

There are many code bases that minify perfectly fine with esbuild under those constraints. For example, dead-code removal is irrelevant for many library-oriented projects where all code ends up being used anyway. And even when dead-code removal is important, doing it only for top-level statements is often sufficient.

It's true that some code bases which make heavy use of certain patterns relying on Terser's code optimization features to have compact output will not minify as well with esbuild. But there are also code bases that don't minify optimally with Terser either and that minify better with other tools instead. For example:

class Foo { foo() { return 1 } }
class Bar extends Foo { bar() { return this.foo() } }
console.log(new Bar().bar())

Terser and esbuild both generate identical output but Google Closure Compiler generates just console.log(1). And there are other advanced optimization tools such as Prepack that do a different set of code optimizations. If you require a tool that minifies optimally in all scenarios, the best way to achieve that is to run all available minifiers on your output and pick the one that resulted in the smallest output.

I can add a caveat to esbuild's minification documentation about code optimizations that are not included in esbuild. I think the list of possible code optimizations that esbuild doesn't do would be something like this:

  • Dead-code elimination within function bodies
  • Function inlining
  • General cross-statement constant propagation
  • Object shape modeling
  • Allocation sinking
  • Method devirtualization
  • Symbolic execution
  • JSX expression hoisting
  • TypeScript enum detection and inlining

@devongovett
Copy link
Author

Sorry, I didn't mean to imply that esbuild shouldn't be used at all as a minifier. I just meant that the optimizations that esbuild does not perform should be documented. I think your list makes sense.

One thing to mention as well is that when used in the context of other build tools like webpack and Parcel that rely on the minifier to perform tree shaking, this limitation may be more pronounced. These tools often produce output that is pre-wrapped in functions for various reasons (e.g. to prevent global scope leakage, ensure correct module execution ordering, etc.). This means that when used with esbuild as a minifier, the output may not be tree-shaken as effectively as with terser, for example, or when also using esbuild for the full bundling process.

@kzc
Copy link
Contributor

kzc commented Mar 8, 2021

In my testing on application bundles, esbuild usually gets to within 5 - 10% of rollup+terser bundle size, which is impressive given its speed.

I noticed that Google Closure Compiler successfully minified only 2/8 of the libraries in the benchmarks above in ADVANCED_OPTIMIZATIONS mode. It is tricky to get it working correctly without carefully crafted code annotations. Its SIMPLE_OPTIMIZATIONS mode is more forgiving and has comparable results to terser, but is 3X slower.

Prepack development appears to have been suspended. Its output was optimized for runtime performance over size.

In order to get an extra 5% size reduction from optimizations it is not uncommon to take 3 to 5 times as much compile time. Function inlining is not typically a big size win in minification in real world application bundles, and it is difficult to get its semantics right with loops, different scopes, arguments and this. Whole program DCE of unused variables, classes and functions for application bundles, on the other hand, has more bang for the buck for comparatively little code.

Side note: uglify-js now supports ES2020 as of last week.

@lydell
Copy link

lydell commented Mar 23, 2021

I’ve got 3 MiB of JS generated by Elm. esbuild gets that down to 766 KiB, while uglify-js is down to 689 KiB (full discussion).

Is there a way to figure out which optimizations of uglify-js are contributing the most to that 77 KiB difference? Thought I could help the lowest-hanging fruit.

@kzc
Copy link
Contributor

kzc commented Mar 23, 2021

See: #731 (comment)

@lydell
Copy link

lydell commented Mar 23, 2021

@kzc Cool, so you think improved dead code elimination is the biggest opportunity?

Edit: Removing Elm’s IIFE lets esbuild remove another 15 KiB. Is there a good way to figure out what uglify-js does to remove another 62 KiB?

@kzc
Copy link
Contributor

kzc commented Mar 23, 2021

The issue and its solution is known to the author as per his comments above.

If you want to contribute, the best way to learn how any compiler works is to read the source code. There's no magic - just work. It's largely AST transformation.

If you look at the example in the top post you can see the basic issue. x and bar are unreferenced and can be dropped. It's hard to beat DCE for reducing code size. In elm's case a function call marked as "pure" can be a candidate for DCE if its result is unused. But often DCE opportunities only occur after other optimizations take place. Dozens of small optimizations collectively lead to smaller code. But at some point there are rapidly diminishing returns for the CPU cost incurred by each optimization.

@lydell
Copy link

lydell commented Mar 23, 2021

Thanks! I was thinking maybe you had some pro-tips on working with/analyzing which parts of uglify-js/terser was kicking in the most or something. I guess no shortcuts then! I’ll take a deeper dive into it when I get time. AST transformation is fun stuff.

@lydell
Copy link

lydell commented Mar 23, 2021

I made a script that enables just one uglify-js compress option at a time and recorded how many characters were saved from each option (whitespace removal and mangle was enabled all the time):

join_vars 35036
unused 32188
merge_vars 20472
conditionals 10307
booleans 1861
collapse_vars 1513
strings 1162
inline 733
if_return 405
evaluate 120
loops 107
unsafe 104
comparisons 53
hoist_funs 34
drop_console 19
keep_infinity 13
dead_code 11
assignments 3
negate_iife 1
yields 0
varify 0
unsafe_undefined 0
unsafe_regexp 0
unsafe_proto 0
unsafe_math 0
unsafe_comps 0
unsafe_Function 0
typeofs 0
toplevel 0
top_retain 0
templates 0
switches 0
spreads 0
side_effects 0
rests 0
reduce_vars 0
reduce_funcs 0
pure_getters 0
pure_funcs 0
properties 0
passes 0
objects 0
keep_fnames 0
keep_fargs 0
imports 0
hoist_props 0
hoist_exports 0
global_defs 0
functions 0
expression 0
drop_debugger 0
directives 0
default_values 0
awaits 0
arrows 0
arguments 0
annotations 0
ie8 -45
sequences -154
hoist_vars -2538
var fs = require("fs");
var UglifyJS = require("uglify-js");

var noCompress = {
  annotations: false,
  arguments: false,
  arrows: false,
  assignments: false,
  awaits: false,
  booleans: false,
  collapse_vars: false,
  comparisons: false,
  conditionals: false,
  dead_code: false,
  default_values: false,
  directives: false,
  drop_console: false,
  drop_debugger: false,
  evaluate: false,
  expression: false,
  functions: false,
  global_defs: false,
  hoist_exports: false,
  hoist_funs: false,
  hoist_props: false,
  hoist_vars: false,
  ie8: false,
  if_return: false,
  imports: false,
  inline: false,
  join_vars: false,
  keep_fargs: true,
  keep_fnames: true,
  keep_infinity: true,
  loops: false,
  merge_vars: false,
  negate_iife: false,
  objects: false,
  passes: 1,
  properties: false,
  pure_funcs: null,
  pure_getters: "strict",
  reduce_funcs: false,
  reduce_vars: false,
  rests: false,
  sequences: false,
  side_effects: false,
  spreads: false,
  strings: false,
  switches: false,
  templates: false,
  top_retain: null,
  toplevel: false,
  typeofs: false,
  unsafe: false,
  unsafe_comps: false,
  unsafe_Function: false,
  unsafe_math: false,
  unsafe_proto: false,
  unsafe_regexp: false,
  unsafe_undefined: false,
  unused: false,
  varify: false,
  yields: false,
};

var options = {
  toplevel: false,
  output: {
    beautify: false,
  },
  mangle: true,
  compress: noCompress,
};

var code = fs.readFileSync(process.argv[2], "utf8");

var baseline = UglifyJS.minify(code, options).code.length;
console.log("baseline", baseline);

for (const [name, value] of Object.entries(noCompress)) {
  var result = UglifyJS.minify(code, {
    ...options,
    compress: {
      ...noCompress,
      [name]:
        typeof value === "boolean"
          ? !value
          : typeof value === "string"
          ? true
          : value,
    },
  });

  if (result.error) {
    console.error(name, value, result.error);
    continue;
  }

  var diff = baseline - result.code.length;
  console.log(name, diff);
}

@kzc
Copy link
Contributor

kzc commented Mar 24, 2021

I made a script that enables just one uglify-js compress option at a time and recorded how many characters were saved from each option (whitespace removal and mangle was enabled all the time):

With perhaps the exception of unused to drop completely unreferenced code, that would give rather limited and misleading information. The majority of uglify compress transformations are only effective when used in combination.

The uglify AST and accompanying data structures for scope and variable definitions are very different from those in esbuild. If you want to implement an all-level DCE pass in esbuild just read up on general compiler algorithms and study the esbuild code base.

@lydell
Copy link

lydell commented Mar 24, 2021

Thanks! You are right, enabling all options at once saves another 17460 characters.

@privatenumber
Copy link
Contributor

I read the part that if the Terser SWC port is in progress, it might not justify the work to improve dead-code elimination (DCE) in esbuild minification. But just wanted to surface a scenario where DCE beyond top-level scope would be very impactful:

Got an issue in esbuild-loader that Webpack builds that use esbuild's minification isn't tree-shaking.

Seems like Webpack relies on Terser to handle DCE, so Webpack builds that use esbuild-loader's minification doesn't fully tree-shake because of the transform API and also because Webpack wraps the distribution in a function.

It might be possible to tap into a concatenated module hook to run the mininfier without iife. However, there is still the bundle API limitation. Hopefully another option or a virtual file-system could solve that in the future (#690).

UberOpenSourceBot pushed a commit to fusionjs/fusionjs that referenced this issue Jun 22, 2021
Introducing `experimentalEsbuildMinifier` compiler option to replace Terser minifier with esbuild. By doing so we've seen the production build times go down by ~30%. At the same time esbuild is not an optimal choice yet, as it does not support the same level of minification as Terser: evanw/esbuild#639 (comment), which results in larger bundle sizes (+ ~10%). This is something web developers will need to take into account when choosing to opt-in to use esbuild miinifer to achieve faster production builds.
@lydell
Copy link

lydell commented Aug 6, 2021

New findings:

  • Running just parts of UglifyJS and then esbuild beats just running UglifyJS, both in terms of size (but very little) and performance (~50% faster).
  • For Elm code, if I remove Elm’s IIFE and then re-add one with --format=iife I get more minification. Pretty easy to do with a couple of lines of code.
  • So when the absolute smallest size isn’t needed I use just esbuild (fast build times, yay!), and when smaller size is needed I can combine with UglifyJS and still get some speed wins from esbuild.

For those interested, I’ve written about it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants