Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a high-performance version of resolve #2925

Closed
cpojer opened this issue Feb 17, 2017 · 23 comments
Closed

Build a high-performance version of resolve #2925

cpojer opened this issue Feb 17, 2017 · 23 comments

Comments

@cpojer
Copy link
Member

cpojer commented Feb 17, 2017

In Jest, we use jest-resolve to resolve modules. However, it shells out to two other dependencies, resolve and browser-resolve. resolve hasn't been maintained as much as I'd like, we have a hacky workaround inside of jest-resolve and we actually have knowledge of the entire filesystem (minus directories) in "HasteFS" (currently not passed to jest-resolve but that can be changed as the places it is created already passes a "ModuleMap" which is also coming from jest-haste-map).

The idea is to take the synchronous part of resolve and put it inside of jest-resolve. Then rewrite it and fine-tune it for performance (cc @trueadm) and use "HasteFS" to optimize the best case scenario (you can use jest-file-exists which is O(1) in the best case and fs.stat in the worst case).

The end state should be for Jest not to depend on resolve any longer. We do not need the asynchronous part of resolve.

Call to resolve: https://github.com/cpojer/jest/blob/master/packages/jest-resolve/src/index.js#L84
See: https://github.com/substack/node-resolve/blob/master/lib/sync.js

Follow-up: We should also aim at replacing browser-resolve and making it configurable so that our resolution algorithm can look at any field in package.json to re-route resolution to a module. This can be done in a follow-up, because this is already a big task :)
browser-resolve: https://github.com/defunctzombie/node-browser-resolve

cc @DmitriiAbramov @wanderley

@quantizor
Copy link
Contributor

quantizor commented Feb 18, 2017

@cpojer I made a follow up comment in my PR about part 2 of this issue. Waiting for your response to finish the implementation in the current state of things.

@urish
Copy link

urish commented May 18, 2017

Update (re #3597) - I added a super simple memoization to resolve/lib/sync.js, seems like it does the trick for me: the number of statSync() calls dropped by an order of magnitude

@tvald
Copy link
Contributor

tvald commented Aug 10, 2017

Since HasteFS isn't available to the public, are there any other plans to improve the performance of resolve?

@cpojer
Copy link
Member Author

cpojer commented Aug 10, 2017

HasteFS is open source and part of jest-haste-map, it's simply a simple virtual filesystem that contains a list of all files in a project.

@sebinsua
Copy link

@cpojer Is there a guide to setting this up?

@cpojer
Copy link
Member Author

cpojer commented Aug 10, 2017

It's all automatic, jest-haste-map is used throughout Jest. jest-resolve only uses ModuleMap but not HasteFS. We can update it to use HasteFS and fall through to fs if the file isn't found. This would optimize the average case, but not worst case, of resolving modules.

@urish
Copy link

urish commented Aug 10, 2017

Here is the workaround I use in my setup to speedup jest-resolve:

https://gist.github.com/urish/1a106ad6d1dbfcf77f5f744ae854fd6a

For me, this small change reduces the test suite run time from about 21 to 14 seconds when using workers, and from about 27 seconds to 16.5 seconds when using --runInBand (I have an Angular app with 250 tests)

@tvald
Copy link
Contributor

tvald commented Aug 19, 2017

I've given this an initial look over in tvald-contrib/jest@feature/haste-resolve.

I think the biggest hurdle is that hasteFS doesn't track directories, just files. Most of our time is spent recursively searching for node_modules. Without a cached directory listing, the replacement for resolve still have to test for the existence of that directory up a potentially deep hierarchy.

I'll continue working on this, however, as there are several obvious optimizations for resolve now that I'm familiar with jest's codebase.

@cpojer
Copy link
Member Author

cpojer commented Aug 19, 2017

@tvald woah, awesome! I'm happy to review pull requests. If your first change is simply to integrate all of resolve's code into jest-resolve, then we can take it from there. Generally, the smaller and more incremental your pull requests are, the more likely it is that we'll merge them quickly :)

I share the hasteFS concern. If watchman can return directories I'm not opposed to add directories to the hasteFS data structure.

@tvald
Copy link
Contributor

tvald commented Aug 21, 2017

Working on the initial PR over at #4315. Once that's in, I'll submit a tiny follow-up PR with memoization that improves resolve performance by orders of magnitude. HasteFS might not even be necessary, although it would certainly improve performance even further.

@tvald
Copy link
Contributor

tvald commented Aug 22, 2017

Memoized PR is at #4323.

Also, noting in this thread that @cpojer requested the following:

Thanks for the initial PR, this is great! In a follow-up, do you mind changing all the single-expression if-statements to wrap them with curly-braces? That's the code style we use in this repo :)
>> original comment

Also, could we add support for the "browser" field in a follow-up and remove "browser-resolve"? I would love it if we could drop that dependency :)
>> original comment

@tvald
Copy link
Contributor

tvald commented Aug 22, 2017

Moving discussion back from #4323:

...this will not work for things like Jest's watch mode because files may change over time of Jest's usage and the resolution algorithm should work with what is currently on the system, not what used to be there in the past.

To solve this, the Resolver class is able to be instantiated. We can merge the resolution logic into the base class and use a cache within there, and each test will receive it's own copy of the resolver, which will be recreated for individual test runs.

This will not share the resolution logic between two individual test files, so won't be as fast as it could get, but within the dependency graph of a single test, the performance optimization should still apply.

We can also change this to memoize for the duration of a single test run, and that would be fine with me, but there is no concept in the worker process that manages the lifetime of a full test run, so we'd have to introduce those hooks somewhere in the workers, which is hard because they are supposed to be stateless...

>> original comment @cpojer

@tvald
Copy link
Contributor

tvald commented Aug 22, 2017

There are some smaller optimizations which don't involve memoization that I'll open a PR for.

Note that Resolver.findNodeModule() is static and is directly called from jest-config in several places:

$ grep findNodeModule -R packages --exclude-dir=build --exclude-dir=__tests__ | cut -f1 -d: | uniq -c
   5 packages/jest-config/src/normalize.js
   3 packages/jest-config/src/utils.js
   3 packages/jest-resolve/src/index.js

Most resolution does seem to occur via an instance of Resolver, as described by @cpojer. However, I'm not familiar enough with Jest's codebase to see an easy path beyond this memoization within a single test.

Unfortunately, I don't think that alone will provide enough of a performance boost for Jest to be usable with our codebase.

@tvald
Copy link
Contributor

tvald commented Aug 22, 2017

New PR at #4325, with cleaned up code and the slight optimization of testing whether a directory exists before attempting to resolve (potentially 10+) paths within that directory.

@tvald
Copy link
Contributor

tvald commented Aug 23, 2017

After some profiling, it looks like memoization-per-test doesn't offer a significant improvement. I'll submit a PR anyways, since HasteFS support should be quite similar to memoization.

Performance-wise, we saw our full test suite drop from ~230s to ~220s (-10s / -4%).

@tvald
Copy link
Contributor

tvald commented Aug 25, 2017

Just noting that in light of #4109 our team is considering divestment from dependencies licensed under BSD+Patents, so I won't be contributing any further patches until that's settled. #4325 should at least have made jest usable with larger codebases.

@niieani
Copy link
Contributor

niieani commented Jan 6, 2018

By memoizing the default resolver, we're seeing a 3x speed boost in our tests. More details here: #4323 (comment)

@SimenB
Copy link
Member

SimenB commented Sep 17, 2018

@tvald you wouldn't by any chance be open to working more on this? 😀 Jest is MIT now

@SimenB
Copy link
Member

SimenB commented Feb 5, 2020

Looking at this now in 2020, I think we should reverse course and go back to using the resolve package.

  • it is excellently maintained these days
  • it has packageFilter which we can use to lookup browser instead of main field and drop the browser-resolve dependency
  • it has a pluggable readFile, isFile and isDirectory functions, so if we want to we can use HasteFS to implement its FS operations, falling back to normal fs if needed. We can also cache/memoize if we want
  • changes are happening to resolution logic in node (ESM, added options such as paths), and keeping up is painful
  • related to the above, ESM supports async resolution, which resolve comes with (albeit as callback, but easy enough to use util.promisify or some such)

@cpojer
Copy link
Member Author

cpojer commented Feb 5, 2020

Yeah that makes sense. The main intent of this issue was to build a really fast resolver, I don't care as much about the implementation details. Resolution is one of the hottest paths during a test run, so a 20% improvement for resolution may result in a 10-20% improvement in overall test runtime.

@github-actions
Copy link

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the Stale label Feb 26, 2022
@SimenB
Copy link
Member

SimenB commented Feb 27, 2022

In general, the entire jest-resolve module is quite messy (the added complexity of async resolution and ESM resolution (which it doesn't even do correctly: #9885)) and it needs a solid refactor. As part of that we should keep performance in mind.

Related, we've added caches for most file operations, so some optimization has landed already.

#8388
#8412
#8650
#9873
#11076
#11969

@SimenB SimenB closed this as completed Feb 27, 2022
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants