Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce typescript package size #27891

Closed
4 tasks done
pauldraper opened this issue Oct 13, 2018 · 56 comments
Closed
4 tasks done

Reduce typescript package size #27891

pauldraper opened this issue Oct 13, 2018 · 56 comments
Assignees
Labels
In Discussion Not yet reached consensus Suggestion An idea for TypeScript
Milestone

Comments

@pauldraper
Copy link

pauldraper commented Oct 13, 2018

Search Terms

size, bloat, install

Suggestion

The typescript package is large, and it only getting larger.

screenshot from 2018-10-13 11-12-06

Version 3.1.3 is a whopping 40MB.

Use Cases

TypeScript is used in many contexts.

A TypeScript formatter (e.g. prettier) does not need an entire compiler. It only needs a parser. And 45MB scripted parser is orders of magnitude larger than one would normally expect. (For reference, the installed npm package for Esprima -- the most compatible and compliant ES parser in the ecosystem -- is a mere 0.3MB.)

Examples

Solution 1: Split up packages

  • typescript (existing package; depends on typescript-compiler, typescript-parser, typescript-server)
  • typescript-compiler (depends on typescript-parser)
  • typescript-parser
  • typescript-server (depends on typescript-compiler)

Optionally, there could be separate packages for typescript-config and typescript-i18n.

Solution 2: Don't duplicate code

There is a lot of code duplication between

  • lib/typescriptServices.js
  • lib/typescriptServices.js
  • lib/tsserver.js
  • lib/tsserverlibrary.js

Don't duplicate the code.

Checklist

My suggestion meets these guidelines:

  • This wouldn't be a breaking change in existing TypeScript / JavaScript code
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. new expression-level syntax)
@pauldraper pauldraper changed the title Reduce TS size Reduce typescript package size Oct 13, 2018
@mattmccutchen
Copy link
Contributor

Some backstory in #23339.

@DanielRosenwasser DanielRosenwasser added Suggestion An idea for TypeScript In Discussion Not yet reached consensus labels Oct 13, 2018
@pauldraper
Copy link
Author

pauldraper commented Oct 14, 2018

Interesting reading, thanks.

TypeScript [2.9.0] has doubled in size since v2.0.0 - now 35 MB

It was "fixed" by #25901, released in 3.1.1, which was 40MB. 🙁


It won't be hard at all to shrink the package size. For example, lib/tsserver.js and lib/tsserverlibrary.js are 98% identical.

$ du -b node_modules/typescript/lib/tsserver.js node_modules/typescript/lib/tsserverlibrary.js
7290127	node_modules/typescript/lib/tsserver.js
7308140	node_modules/typescript/lib/tsserverlibrary.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/tsserverlibrary.js) | wc -c
7207205

And 99% of lib/typescript.js is identical to those.

$ du -b node_modules/typescript/lib/typescript.js
6859801	node_modules/typescript/lib/typescript.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/typescript.js) | wc -c
6850490

And lib/typescriptServices.js is byte-for-byte identical to that.

$ sha1sum node_modules/typescript/lib/typescript.js node_modules/typescript/lib/typescriptServices.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescript.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescriptServices.js

And 99% of lib/typingsInstaller.js is identical to that.

$ du -b node_modules/typescript/lib/typingsInstaller.js
5285788	node_modules/typescript/lib/typingsInstaller.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/typescriptServices.js) | wc -c
5246999

And 80% of lib/tsc.js is identical to that

$ du -b node_modules/typescript/lib/tsc.js
3912404	node_modules/typescript/lib/tsc.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/tsc.js) | wc -c
3219205

That's nearly 30MB of duplication just in those few files (and this doesn't even include declaration files).

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

@MartinJohns
Copy link
Contributor

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

It's done this way so that every file can be used by itself without having to deal with the nastiness of modules in JavaScript. Every file is a functional library/program in itself. I think that is a great thing, at the cost of some disk space.

@pauldraper
Copy link
Author

pauldraper commented Oct 15, 2018

Reading the linked issue #23339, it appears that it desire is in fact to (eventually) use modules.

#23339 (comment)

If we used modules, we'd be able to share each file and avoid this duplication

it is something we want to do, but no plans for the short term. that is where the majority of savings would come from.


nastiness of modules in JavaScript

ES module systems in general can be hit-and-miss, but reminder that we're talking specifically about an npm package.

npm, npm packages, node_modules, package.json, etc. are relate to Node.js (or clones) which supports CommonJS. Right?

@Kingwl
Copy link
Contributor

Kingwl commented Jan 18, 2019

I have two ideas, but I am not sure which one is better.

  1. split code in the source code
    for now, some common utils or helper has been shared with a different component, we could split them by function, eg: utils.ts -> utils.ts( common), utils.factory.ts(depend on factory), utils.emitter.ts(depend on emitter), etc.
    if you want a factory or emitter only. just create a tsconfig.json file that include the depended file,

  2. analyze and transform the bundled file
    the namespace has been compiled to many iife and injected the namespace instance,
    we could compile with target esnext and merge those iife, then transform the ts.xxx = xxx to export xxx,
    and then, we could pack them as a normal esm project and tree shark

@Kingwl
Copy link
Contributor

Kingwl commented Jan 18, 2019

ping @DanielRosenwasser
What do you think about that🧐?

@DanielRosenwasser
Copy link
Member

I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API - at which point, our consumers would actually be the ones winning from tree-shaking.

Splitting source on its own can help, but practically speaking the larger components like services and TSServer will need the entire core compiler.

I think that converting to modules is the most practical and obvious way to avoid duplicating most of the contents of tsc.js 3+ times.

@mihailik
Copy link
Contributor

A simpler solution: inspired from Busybox.

Combine N near-duplicate files into 1 polymorphic file that can do N things based on a parameter passed in.

It would introduce a performance overhead of parsing tiny % of unnecessary JS code, but can make the tool integration story way simpler. Maybe worth it?

One trivial way to know which feature is expected would be to directly copy Busybox approach: symlink all the duplicate files and differentiate at runtime based on the __filename. Saves disk space, package size, bandwidth. There are more interesting options too.

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Sep 10, 2019

From speaking with @RyanCavanaugh, it sounded like @orta was interested in working on this.

@dsherret
Copy link
Contributor

+1 for splitting up typescript into multiple packages. One major benefit would be that these individual packages (other than the "typescript" package) could use semantic versioning on at least their APIs then other libraries could just depend on the packages they need. Right now it's kind of a pain to maintain a library that has a peer dependency on the typescript package (without being super strict about the supported version).

@orta
Copy link
Contributor

orta commented Sep 11, 2019

Yeah, I'm chatting with folks internally this week, but my goal is roughly:

  • Let the package typescript be the same as right now (as removing things would break the world) which provides all tooling

Then have subset packages which are smaller and focused on a specific task:

  • @typescript/tsc for folks who are just doing compilation (e.g. tsc compiles on the server, prettier for the AST)
  • @typescript/services for folks building dev tools like monaco-typescript, or executeprogram etc

I doubt I can offer any useful semver on them, as they link to the main TS version. That'd need the API to actually be classed as "stable" which doesn't look like that's happening soon.

Figuring out how/if we can reduce the main "typescript" is hopefully something I can get an idea about during ^

@nykula
Copy link

nykula commented Sep 11, 2019

Removing tools from the package doesn't reduce overall size. Compilation, dev tools etc reuse a lot of the same code that is now copied to multiple commands without changes. The issue is how to share the very duplicated part between the tools, reduce the duplication, or pack the tools into one bundle.

@weswigham
Copy link
Member

Yeah, I'm chatting with folks internally this week, but my goal is roughly

Oh, we're generally for it (and have been for years, provided we still provide a services bundle for our (browser) consumers who use it) - we just need an automated way to remap the current namespace-based code layout into modules, this way we can keep a PR doing the migration up to date and not stop development on other things. I have a branch from two years ago that migrated all of src/compiler to modules (by hand) - checker.ts had something like 100 lines of imports on it. And that took quite awhile to make. That gave some of us some pause and reduced enthusiasm, but... I'm hoping the final result is still seen as worth it.

With respect to said automation, I think we could probably write a kind of codemod for it using the APIs we have today, but nobody's put in the effort yet.

@mjbvz
Copy link
Contributor

mjbvz commented Oct 31, 2019

@orta VS Code is very interested in this work. Right now we consume TypeScript in two ways:

  • tsserver.js — Used by our JS/TS extension
  • typescript.js — Used by our html extension

Each of those files is around 8MB on disk. Additionally, are interested in shipping built-in support for tsc (tsc.js), but that's another 4.5MB and that's difficult for me to justify. It seems to me like all these various TypeScript components should be able to share a lot of code.

Let me know if you would like any additional info about how VS Code consumes TS


As a side note, typingsInstaller.js is pretty huge too (6MB)!! Does it pull in a lot of stuff from TS core?

@orta
Copy link
Contributor

orta commented Nov 4, 2019

I brought this up during the most recent design meeting - #34899

Where the end result was basically, we're meeting about trying to get modules happening again

As mentioned above - all of these files are basically the same but with a bit of flavor difference because they represent different sets of the compiler + services - for example I think you can probably use tsserverlibrary for both the html + JS/TS cases in vscode, buttsc.js doesn't look like it lives in there.

@jakebailey
Copy link
Member

I am filing followup issues now that the modules PR has been merged.

One such issue of interest here is #51440; the TL;DR is that if we raise our minimum supported Node version to Node 12, we could safely ship our executables as ESM, which would save us roughly 7 MB more on top of the 43% reduction above.

@pauldraper
Copy link
Author

pauldraper commented Nov 8, 2022

The reduction from modules is very significant. (Thanks!!!!!!)

If your math is correct, that reduces the package size from 65MB to 36MB.

Which is still larger than it was when #23339 was filed, asking for it to be smaller.

But alas, such is progress.

This was the largest possible improvement to the size. More could be done, but it's not gonna cut in half again.

@jakebailey
Copy link
Member

Eventually, we may be able to ship as ESM and achieve the smallest possible package. Or, go further and publish individual packages for parts of our repo. That goal's a long way off, but there is work left o be done here.

@styfle
Copy link
Contributor

styfle commented Nov 8, 2022

Confirmed, TS nightly is much smaller now, thanks!

  • Before: install size
  • After: install size

@vostrnad
Copy link

vostrnad commented Nov 8, 2022

Following the migration to modules in typescript@5.0.0-dev.20221108, I ran my minification tests again. Using uglify-js on the five largest JavaScript files now reduces the package size from 35.6 MB to 18.0 MB:

File Size Minified size
tsc.js 5097 kB 2281 kB
tsserver.js 7923 kB 2999 kB
tsserverlibrary.js 7886 kB 2983 kB
typescript.js 7338 kB 2705 kB
typingsInstaller.js 1756 kB 985 kB

@jakebailey
Copy link
Member

jakebailey commented Nov 8, 2022

I mentioned minification in the module conversion PR; we are restricted on that front because so many people still patch our package. If we minify, patching becomes difficult to impossible.

I'd love to be able to do so, but we have to figure out what to do about that first.

(We'd also probably not go "full" minify; we need to keep names for backtraces.)

@pauldraper
Copy link
Author

Minify only saves space if you don't include source maps.

And excluding source maps seems like deal-breaker.

@jakebailey
Copy link
Member

We already exclude source maps in the package, but our output is left "pretty" so that stack traces are meaningful when provided by downstream users.

If we were enabling minification, we would likely only have it remove whitespace and optimize syntax, leaving names in the output.

@RyanCavanaugh
Copy link
Member

Re: ES Modules, I think we have to take performance as a serious goal. We get a big speed boost from esbuild's whole-program-aware bundling and giving that up for a better sticker number isn't a good trade-off for most users. People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

@jakebailey
Copy link
Member

jakebailey commented Nov 8, 2022

Yeah, this is something I want to performance test; my impression is that ESM imports should be as fast as the whole-program bundling. I think that the differences were really down to variance + load time.

@DanielRosenwasser
Copy link
Member

People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

It's worth noting that vendoring has some big tradeoffs which might leave a user worse off. If someone still installs TypeScript (due to another dependency, for custom build tasks, or for having their editor use a workspace version), that person gets even more duplication of TypeScript, possibly with mismatched versions.

@jakebailey
Copy link
Member

This is closed, but since people do still follow this issue, #55273 is on the docket for an early 5.3 merge; this PR effectively replaces typescript.js with tsserverlibrary.js and removes the latter. This leaves typescript.js as the sole provider of the public API, saving roughly 8MB unpacked. Copy/pasting the package size report that is run on PRs:

Before After Diff Diff (percent)
Packed 6.90 MiB 5.48 MiB -1.42 MiB -20.61%
Unpacked 38.74 MiB 30.41 MiB -8.33 MiB -21.50%
Before After Diff Diff (percent)
lib/tsserverlibrary.d.ts 570.95 KiB 865.00 B -570.10 KiB -99.85%
lib/tsserverlibrary.js 8.57 MiB 1012.00 B -8.57 MiB -99.99%
lib/typescript.d.ts 396.27 KiB 570.95 KiB +174.68 KiB +44.08%
lib/typescript.js 7.95 MiB 8.57 MiB +637.53 KiB +7.84%

As for our executables (and potentially an ESM API); that'll be handled by #51440 when I get to dealing with the long set of changes that are required to make that happen.

@pi0
Copy link

pi0 commented Oct 6, 2023

Hi! First of all, thanks @jakebailey and the rest of the typescript team for constantly working on this matter to reduce the typescript install size 💙

With the awareness of all these efforts, I made an experimental project tslite.

tslite is a redistribution of TypeScript without API changes and with optimizations like code minification that probably won't be possible for the typescript package itself but (significant) smaller size benefits a segment of users that directly install/need typescript as a peer dependency in their projects.

I hope this project will be helpful rather than something conflicting with the future roadmap of install size optimizations from the core package.

@jakebailey
Copy link
Member

jakebailey commented Oct 6, 2023

There is still more size work that can be done, specifically #51440.

However, I will note that the problem of package sizes is really not as bad as people think these days; every modern package manager uses hardlinks to a global cache, meaning that every install of TypeScript on a system will share the same backing files on disk. The "apparent" size may seem duplicative, but it's really all shared.

That and the install size seen on packagephobia is the unpacked size; the actual bits transferred from the registry are much, much smaller. Even gzip brings the tarball to about 6MB. tslite is smaller on that front at about 3MB, but overall most people only download each version of TypeScript once.

That combined with the hardlinking really means that we're talking about a few MB per system, paid once. One spends more network and disk space loading up Twitter or even GitHub via images and scripts that change often than the TS package.

I'm still going to try and make it smaller because I find it fun to do so, but it's a little moot IMO.

@ArnaudBarre
Copy link

ArnaudBarre commented Oct 6, 2023

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

@pauldraper
Copy link
Author

pauldraper commented Oct 6, 2023

every modern package manager uses hardlinks to a global cache

Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

overall most people only download each version of TypeScript once

There are over 2,800 versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 TypeScript versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

@jakebailey
Copy link
Member

jakebailey commented Oct 6, 2023

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

That's certainly true. It's a shame that these systems do not cache their artifacts.


Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

Yarn 3 supports hard linking (https://yarnpkg.com/configuration/yarnrc#nmMode). If you're still using Yarn v1, you're not going to get any new features at all.

I was wrong about npm; it has a global cache but it copies the files.

There are 2,800+ versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

There should really only be one TS version in a project; if this is happening, then some package is over-restricting what version of TS it needs. All modern package managers allow you to override versions within a workspace, and I would think it'd be safe to do that if space is a concern and your package manager can't hardlink.

It's also misleading to say that there are 2,800 versions of TypeScript; there are only a handful of stable releases. The rest are nightly builds.

@spacecowgoesmoo
Copy link

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

@jakebailey
Copy link
Member

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

I'm referring specifically to doing this in npm:

"overrides": {
    "typescript@*": "$typescript"
},

Or in yarn:

"resolutions": {
    "typescript@*": "$typescript"
},

Or in pnpm:

"pnpm": {
    "overrides": {
        "typescript@*": "$typescript"
    },
}

I am not referring to any sort of post-install patching, but just asking the package manager to resolve to a single version.

@spacecowgoesmoo
Copy link

spacecowgoesmoo commented Oct 6, 2023

The point is that an override only seems reasonable because other dependencies don’t require any extra setup. NPM repos are supposed to be low-effort installs and typescript should be no exception.

@pauldraper
Copy link
Author

pauldraper commented Oct 6, 2023

It's a shame that these systems do not cache their artifacts.

There should really only be one TS version in a project

npm; it has a global cache but it copies the files.

Yes, as you say, IDEs, package maintainers, and package managers should be aggressively deduplicating redundancies.

....

....

....

....

And TypeScript should be doing the same. (Right now it's something crazy like ~75% duplicate code.)

@jakebailey
Copy link
Member

jakebailey commented Oct 6, 2023

Yes, again, #27891 (comment) removed one more copy, and #51440 will remove even more (down to the absolute minimum of 2 copies one can have when shipping both CJS and ESM). I'm not sure what else I can say, I was just originally attempting to explain that a large bulk of situations do not benefit from the effort to lower the package size.

@RyanCavanaugh
Copy link
Member

I think it'd be useful for people to be a bit more specific about what they care about so we can tailor our efforts.

For time-over-wire, deduplication isn't a great savings, since each additional copy is a tiny increment (compressed checker.ts (2 MB) is 416k, compressed 4x checker.ts is 418k)

For space-on-disk, uh, I'm going to need some more details. It's not 1998 anymore. 40 MB is 0.04% of a terabyte. If the problem is that there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do. That's a package manager problem, not a TypeScript problem, it's unrealistic to expect projects to put their effort into slimming down instead of having package managers duplicate less.

For bundling into other projects like web IDEs, treeshaking is going to be a big part of any successful strategy here. Identifying places where we can be more shakeable is a good thing.

@kurtextrem
Copy link

For space-on-disk, uh, I'm going to need some more details.

I see where you're coming from, but npm is still the most commonly used package manager. The problem is npm, definitely, but who knows when this will change? Anything that reduces TS size definitely has impact on the ecosystem.

Same is true for the "time-over-wire" thing, shipping 1 MB less is probably not doing anything for one person, but if you multiply 1000 kb by 43 million downloads weekly, the picture looks totally different again.

As far as I can tell, the details you need are written down here: https://github.com/pi0/tslite#how

@mhart
Copy link

mhart commented Oct 7, 2023

There are plenty of cases where cache isn't available and the size of the package matters – for over the network size, number of files that need to be written, and amount of JS that needs to be parsed at execution time.

  1. Container builds. Installs in containers typically have no cache (would require volume mounting, etc), so installs are slower. The resulting container size also matters for a number of reasons, from execution time, to registry push time over slow networks, etc, etc. Ppl typically try to keep their container sizes to a minimum, so small packages help here.
  2. CI builds. Similar to above. Often done in containers. Many CI systems have caching abilities, but they can be complicated to setup – and many don't. So often typescript is being installed from scratch each time, for every single build, just adding time to every single build.
  3. Serverless environments. Environments like Lambda, Google Cloud Run, Cloudflare Workers, etc execute much better with smaller zipfiles/container sizes. Reducing dependencies in these environments is a known best practice. Large packages are frowned on. Some have limits on size.
  4. Performance. The more files parsed, the more JS parsed, the slower a package is to start.

@pauldraper
Copy link
Author

pauldraper commented Oct 8, 2023

If the problem is that there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do

35 to 10 is a 71% reduction.

The latest version (5.2.2) has tsc, tsserver, tsserverlibrary, and typescript which total 32MB but have only 9MB of unique content.

Removing the duplicate code drops the package from 41MB to 18MB, a 56% reduction.

So.....actually, there is a lot that TS can do.


it's unrealistic to expect projects to put their effort into slimming down instead of having package managers duplicate less

The problem is largely a synthetic one introduced by TS's bundling. The source code (excluding tests) is only 32MB.

@pauldraper
Copy link
Author

pauldraper commented Oct 8, 2023

there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do

Tangential, but if you want to go that route @RyanCavanaugh , there's a four-year PR open for PnP to dedup installs, maybe it could be get some eyeballs :)

#35206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
In Discussion Not yet reached consensus Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests