Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf improvements - avoid persisting haste map / processing files when not changed. #8153

Merged

Conversation

scotthovestadt
Copy link
Contributor

Summary

At Facebook, in a common situation where you've changed a couple of files in the largest haste map, this PR cuts off 25%~ of the startup time. In other less common situations where you've working on a smaller haste map, the improvement is 60%~.

The improvement is gained by:

  • Not re-serializing and writing the haste map to disk if it was loaded off of disk and then not changed.
  • Not re-creating from scratch the map and mocks part of the haste map on startup when we know what specific files were changed. Instead, just re-process only the specific changed files.

I've benchmarked the startup time by:

  1. Setting up a single test
  2. Running once to prime the cache
  3. Changing a test file
  4. Running the test again with --skipFilter (measuring at this point via time)

I've been a bit conservative and I'm just doing a full re-process when files were deleted (same as current behavior) but I may improve that. It's much less common to delete a file than to edit a file and I wanted to keep the code as simple as possible initially.

In cases where Watchman isn't being used or is freshly started, there is no difference.

I'm always a little suspicious when something relatively simple yields such a large performance improvement, so please help by casting a very critical eye on this PR and all assumptions that I made.

Test plan

  • All tests pass.
  • Tested manually in multiple situations.
  • No change in behavior without watchman.
  • Manually verified the cache file is updated appropriately in a variety of situations.

@SimenB
Copy link
Member

SimenB commented Mar 19, 2019

Woah, crazy numbers. Love it!

Is it possible to add a test for this? And update the changelog 🙂

});
const __hasteMapForTest =
(process.env.NODE_ENV === 'test' && hasteMap) || null;
return this._watch(hasteMap).then(() => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick observation, since this._buildPromise is an async function, is there a need to call then on this._watch? Shouldn't this be an await call followed by a return of the object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't used to use async/await in the past, so this is probably why it looks this way. It can be changed to await.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to await. Thanks for the feedback.

Copy link
Member

@cpojer cpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, can't believe how big the speedup is for Haste Map reconciliation!

Copy link
Contributor

@rubennorte rubennorte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for working on this.

Did you measure the impact of this change with a clean cache? It'd be great to have the time to build the haste map (specifically, which will be more precise than the full time to run jest) with a cold and a warm cache, and with different scenarios (no files changed, some files modified, some files removed, etc.).

@scotthovestadt
Copy link
Contributor Author

@rubennorte This PR does not impact the cold cache run time at all. What I've done here is basically just add code paths that only fire when the cache is warm to avoid doing unnecessary work. When the cache is cold, or files were deleted, the PR basically has no difference than what's on master currently.

After this PR, the main culprits of the cache generation time are reading and deserializing the cache and serializing and writing the cache back to disk, which account for 30% of the current start time and almost the entirety of the cache generation time when warm.

I have another PR coming that refactors the serialization with the same theme as this PR-- we should only need to write what changed. I expect it to cut off 70% of the cache generation time (when warm), based on early measurements!

hasteMap.map = map;
hasteMap.mocks = mocks;
return hasteMap;
} catch (error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do

} finally {
  this._cleanup();
}

instead of the catch (and remove it from the happy path as well). Not sure if it's better or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the catch actually throws the error, it wouldn't make it to the finally block in this case. But yeah, the code duplication (even just calling the method) annoys me a little too. I don't see a way around it, though.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants