Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize haste map data structure for serialization/deserialization. #8171

Merged
merged 10 commits into from Mar 20, 2019

Conversation

scotthovestadt
Copy link
Contributor

Summary

This simple PR is the result of a ridiculous number of attempts to specifically optimize the serialization and deserialization of the haste map followed by a simple realization: creating arrays is expensive. :)

Currently, we store each file's dependencies as an expensive array. However, we don't typically access many file dependencies, so it's not time well-spent.

I've modified the data structure to store the file dependencies as a much cheaper string (tab separated) that is deserialized to an array on-demand. No change to the public interface, but now we're not doing any unnecessary work.

I profiled both the full Jest run time and the read/persist time to ensure I had a clear view into the characteristics of this change. It's somewhere in the neighborhood of a 10%~ startup time improvement at FB, which is significant.

Benchmarking against a generated benchmark hash map of 300,000 files:

before--
read: 2,291ms
persist: 2,413ms
total: 4,704ms

after--
read: 1,852ms
persist: 1,539ms
total: 3,391ms

delta: 1,313ms

Test plan

  • Updated tests
  • All tests pass
  • Manually benchmarked

'fruits/__mocks__/Pear.js': ['', 32, 42, 0, ['Melon'], null],
'vegetables/Melon.js': ['Melon', 32, 42, 0, [], null],
'fruits/Banana.js': ['Banana', 32, 42, 0, 'Strawberry', null],
'fruits/Pear.js': ['Pear', 32, 42, 0, 'Banana/Strawberry', null],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing \t delimiter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

Copy link
Member

@cpojer cpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is crazy. I love it.

Copy link
Member

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo! Keep'em coming 😀

(changelog 😬 )

@jeysal
Copy link
Contributor

jeysal commented Mar 20, 2019

These numbers sound great, thanks! I think we should use the colon (:) as the delimiter though. Afaik only NUL and : are forbidden in UNIX paths, a tab could occur in a path normally.

@jeysal
Copy link
Contributor

jeysal commented Mar 20, 2019

Try e.g.

require('fs').writeFileSync('/tmp/\t', 'asdf')

Copy link
Contributor

@jeysal jeysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above we should change the delimiter to : like it is used in $PATH etc., other than LGTM!

@SimenB
Copy link
Member

SimenB commented Mar 20, 2019

We talked about it, but just so it's written down - a : might not work on windows if the file is on another drive (D:/blabla)

@rubennorte
Copy link
Contributor

These numbers sound great, thanks! I think we should use the colon (:) as the delimiter though. Afaik only NUL and : are forbidden in UNIX paths, a tab could occur in a path normally.

Yeah, I'd recommend using \0.

@scotthovestadt did you measure the impact in building the haste map or the full startup time? I imagine this can have an impact in the time to determine the tests to run (see https://github.com/facebook/jest/blob/master/packages/jest-resolve-dependencies/src/index.ts) as it resolves the dependencies of all the modules in the haste map. Maybe it'd be worth it to memoize the array (even in the same object, as it won't be persisted again).

@scotthovestadt
Copy link
Contributor Author

scotthovestadt commented Mar 20, 2019

@jeysal Will extract the delim to a var and use \0.

Considered memoizing but don't think it'll be worth it with the current access patterns.

I did test the haste map generation and while there might be a small hit it wasn't actually measurable because:

  1. It's a relatively variable time already (it has a range of a couple seconds once you hit the threshold I'm benchmarking at).
  2. The persist at the end is faster so that might be making up for the slower generation.

In any case, it can be optimized away by doing the join in the worker thread (which would introduce the complexity of detecting which type to avoid a breaking change). I didn't go there yet because I wasn't able to measure a difference by doing that... only pushing up changes where I can measure a significant difference. :)

@rubennorte
Copy link
Contributor

rubennorte commented Mar 20, 2019

@scotthovestadt my point is that if you only measure haste map build time you won't see that the cost might be transferring to a later stage in the process (in this case test determination). I'd try to measure the time it takes to get the list of tests that need to run when you use --findRelatedTests (see https://github.com/facebook/jest/blob/master/packages/jest-core/src/SearchSource.ts#L158). That should give you a good measure about the impact this could have in finding inverse dependencies and, if it's not negligible, try using memoization to reduce it.

@codecov-io
Copy link

codecov-io commented Mar 20, 2019

Codecov Report

Merging #8171 into master will decrease coverage by <.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8171      +/-   ##
==========================================
- Coverage   62.28%   62.28%   -0.01%     
==========================================
  Files         265      265              
  Lines       10440    10435       -5     
  Branches     2538     2535       -3     
==========================================
- Hits         6503     6499       -4     
- Misses       3352     3354       +2     
+ Partials      585      582       -3
Impacted Files Coverage Δ
packages/jest-haste-map/src/constants.ts 100% <ø> (ø) ⬆️
packages/jest-haste-map/src/HasteFS.ts 46.87% <0%> (-3.13%) ⬇️
packages/jest-haste-map/src/crawlers/node.ts 80.76% <100%> (ø) ⬆️
packages/jest-haste-map/src/index.ts 81.73% <100%> (+0.23%) ⬆️
packages/jest-haste-map/src/crawlers/watchman.ts 88.46% <100%> (ø) ⬆️
packages/expect/src/matchers.ts 96.34% <0%> (+0.7%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5fd7aa...c0de52e. Read the comment docs.

Copy link
Contributor

@rubennorte rubennorte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks @scotthovestadt!

Copy link
Contributor

@jeysal jeysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@jeysal jeysal merged commit f809c79 into jestjs:master Mar 20, 2019
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants