Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emitDependency caching and fix pnpm symlinks #238

Merged
merged 4 commits into from
Oct 10, 2021
Merged

Conversation

ijjk
Copy link
Member

@ijjk ijjk commented Oct 9, 2021

This removes the resolve caching introduced in #237 and replaces it with caching the entire emitDependency call so that all nested operations shouldn't need to be cache separately.

This provides much better performance than the previous caching as testing against a couple large apps showed these results:

  • vercel.com: before(70 - 100 seconds) after(20 - 30 seconds)
  • private project: before(170 - 200 seconds) after(50 - 60 seconds)

@codecov
Copy link

codecov bot commented Oct 9, 2021

Codecov Report

Merging #238 (ec49f51) into main (d568f9a) will increase coverage by 0.03%.
The diff coverage is 86.44%.

❗ Current head ec49f51 differs from pull request most recent head b46709c. Consider uploading reports for the commit b46709c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #238      +/-   ##
==========================================
+ Coverage   81.25%   81.28%   +0.03%     
==========================================
  Files          13       13              
  Lines        1472     1480       +8     
  Branches      545      546       +1     
==========================================
+ Hits         1196     1203       +7     
- Misses        109      110       +1     
  Partials      167      167              
Impacted Files Coverage Δ
src/resolve-dependency.ts 69.81% <80.00%> (-2.12%) ⬇️
src/node-file-trace.ts 86.99% <86.36%> (+0.23%) ⬆️
src/analyze.ts 86.49% <100.00%> (ø)
src/utils/sharedlib-emit.ts 66.66% <100.00%> (-1.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d568f9a...b46709c. Read the comment docs.

@ijjk ijjk marked this pull request as ready for review October 9, 2021 01:35
@ijjk ijjk requested a review from timneutkens October 9, 2021 01:35
const emitDependencyPromise = new Promise<FilesToEmit>(async (resolve, reject) => {
try {
const curFilesToEmit: FilesToEmit = { files: [], reasons: {} }
const job = new Proxy(this, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a proxy here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to allow us to catch all calls to emitFile even nested ones and add the files to the cache.

@styfle styfle requested a review from guybedford October 9, 2021 03:51
@ijjk ijjk requested a review from styfle October 9, 2021 14:46
Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉

@guybedford
Copy link
Contributor

Great improvements, and looks good to me, didn't read all the code but could just ask a few questions if it would help:

  • How the gains are achieved here? If it is primarily disk, what were the misses on the previous PR? And if it is primarily due to compute, that would be very surprising since 1000s of dependencies should be fully iterable in well under 5 seconds.
  • With the caching stuff it's also worth putting some thought to lifetimes - what's nice about the stat cache etc is they are easier to mutate / coordinate on for longer period caching, whereas approaches like these PRs move towards more opaque caching. Or possibly doing a separation of node_modules caching from user code caching. What sort of cache lifetime applies to these types of builds?

@ijjk
Copy link
Member Author

ijjk commented Oct 10, 2021

It seems eliminating iterating over the dependencies reduces the time significantly on large projects e.g. thousands of dependencies with > 100 separate runs of nodeFileTrace sharing the same cache.

I think currently we should consider the emitDependency cache specific to one build as invalidating this cache would require checking all files in the cache entry and if any of them changed the parent entry needs to be invalidated. I'm not sure if we're attempting persisting this cache for multiple builds that could have had files change between them in any current uses of nft but if we are we can disable it by clearing the emitDependency cache between runs.

@ijjk ijjk added the automerge Automatically merge PR once checks pass label Oct 10, 2021
@guybedford
Copy link
Contributor

It seems eliminating iterating over the dependencies reduces the time significantly on large projects e.g. thousands of dependencies with > 100 separate runs of nodeFileTrace sharing the same cache.

This really is very surprising to me, as it is fully possible to iterate 1000s of modules in well under a second. So there must be something else going on. Could be worth digging into some further analysis there, but this PR is good to land too of course.

@guybedford
Copy link
Contributor

Perhaps there is some quadratic behaviour in the iteration?

@kodiakhq kodiakhq bot merged commit 798a07e into main Oct 10, 2021
@kodiakhq kodiakhq bot deleted the fix/pnpm-symlinks branch October 10, 2021 15:50
kodiakhq bot pushed a commit to vercel/next.js that referenced this pull request Oct 14, 2021
This updates to the latest version of `@vercel/nft` which includes better shared caching between runs. 

x-ref: vercel/nft#239
x-ref: vercel/nft#238
x-ref: vercel/nft#237
@@ -0,0 +1,11 @@
[
"test/unit/pnpm-symlinks/input.js",
"test/unit/pnpm-symlinks/node_modules/.pnpm/parse5-htmlparser2-tree-adapter@6.0.1/node_modules/parse5",
Copy link

@pi0 pi0 Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an expected behavior that includes symlink directories in fileList?

natew pushed a commit to natew/next.js that referenced this pull request Feb 16, 2022
This updates to the latest version of `@vercel/nft` which includes better shared caching between runs. 

x-ref: vercel/nft#239
x-ref: vercel/nft#238
x-ref: vercel/nft#237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Automatically merge PR once checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants