Allow more aggressive transform caching in multiproject monorepos #10835

lexanth · 2020-11-16T13:36:25Z

🐛 Bug Report

In multiproject configs, each project root gets its own jest transform cache. This leads to duplicate work transforming the same files, even if they use the same configs. If a file is used in n projects, it will be transformed n times.

Locally, this increases cache disk usage. In CI, where the cache will not be warm, it increases runtime as duplicate work is required.

Writing a custom transformer with a simpler cache key implementation does not solve this - each project gets a separate cache folder.

The relevant code that does this is in https://github.com/facebook/jest/blob/c98b22097cb6faa3ed3fabf197cbe4f466620b9f/packages/jest-transform/src/ScriptTransformer.ts#L132-L136 - forces a unique cache path per config.name
If unassigned, config.name is assigned to a hash based on the path and index.

I've tried adding a common name to all projects' jest configs. That fixes the transform problem, but breaks other things (manual mocks in an __mocks__ folder don't work consistently). On our large monorepo, this gave a ~30% improvement in total runtime, but __mocks__ becoming unpredictable

I appreciate that there are edge cases to handle here (potentially different jest configs could warrant a different cache), but I think it should be available for the jest transformer to decide whether this is important (e.g. if relevant, a transformer could include config.name in the cache key manually).
e.g. optionally allow transformers to provide their own implementations of getCacheFilePath, which overrides the use of HasteMap.getCacheFilePath( this._config.cacheDirectory, 'jest-transform-cache-' + this._config.name, VERSION, )

If this sort of change would be accepted, I can probably provide a PR.

To Reproduce

Steps to reproduce the behavior:

Set up a multiproject config
Run tests with a cleared cache
Either observe the cache on disk, or tap into the transform to count the number of times a file is transformed

Expected behavior

If the transform config is the same, each file is only transformed once

Link to repl or repo (highly encouraged)

https://github.com/lexanth/jest-projects-repro

This is a monorepo with 3 packages (A, B and C). A and B consume C. the code in C currently gets transformed once per package, even with the transformer (in the jest-preset package) giving a super aggressive cache key implementation (yarn test:ci - could be used e.g. in CI, if we know the other relevant configs are constant).

Adding name: process.env.USE_SIMPLIFIED_CACHE ? '_' : undefined to each package's jest config makes them all use the same cache, but in my actual repo breaks other things, being a bit of a hack.

Everything is running in band because the tests are so fast that multiple workers all start transforming before another can populate the cache anyway.

envinfo

  System:
    OS: macOS 10.15.7
    CPU: (16) x64 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  Binaries:
    Node: 12.18.1 - ~/.nvm/versions/node/v12.18.1/bin/node
    Yarn: 1.22.10 - /usr/local/bin/yarn
    npm: 6.14.5 - ~/.nvm/versions/node/v12.18.1/bin/npm
  npmPackages:
    jest: ^26.6.3 => 26.6.3

The text was updated successfully, but these errors were encountered:

SimenB · 2020-11-16T18:23:47Z

Allowing transformers to override caching directory seems fine to me. @cpojer @thymikee @jeysal thoughts?

https://github.com/facebook/jest/blob/a66eec74b55dc98ab072be330c13c54c35cba5f7/packages/jest-transform/src/ScriptTransformer.ts#L128-L132

thymikee · 2020-11-16T18:50:38Z

I think we can make transformers smarter about when to bust transform cache, by picking only configuration that changes the output, instead of relying on one-size-fits-all solutions like stringified config or project name. If it's done well, overriding from transformers shouldn't be necessary.

cpojer · 2020-11-16T22:15:41Z

Agreed with @thymikee, anything else is hacky.

lexanth · 2020-11-17T08:23:16Z

@thymikee At the moment in a multi-project run, transformers can't be smart about it - jest will always split the cache into separate caches per project.

We could either remove this._config.name from the calculation of the cache path (and leave it all down to the transformer), or allow the transformer to override the cache folder here.

Happy to also look at a follow up change to the babel-jest transformer to make the cache key only depend on the relevant parts of the jest config. For comparison, ts-jest looks to be a bit smarter here about what bits of config to use - https://github.com/kulshekhar/ts-jest/blob/master/src/index.ts

SimenB · 2020-11-17T09:08:46Z

I don't understand the "let's be smarter about busting in transformers" comments - even if the cache key is the same it's a cache miss since Jest will look in different directories for different projects when checking if the cached file exist. If you by "instead of relying on one-size-fits-all solutions like stringified config or project name" mean "remove project name from the algorithm" that has nothing to do with the transformers themselves. That code lives in @jest/transform. I'm down with just removing that part of it which should solve it as we'll be trusting the cache key from transformers.

Making getCacheKey of babel-jest "smarter" is orthogonal to this issue (although I agree it should be done) as this issue is about unlocking the ability of transformers to be "smarter" at all - any update we make to getCacheKey would be void since @jest/transform wouldn't get cache hits regardless

thymikee · 2020-11-17T09:10:39Z

We could either remove this._config.name from the calculation of the cache path (and leave it all down to the transformer),

That's what I'm after.

I'm down with just removing that part of it which should solve it as we'll be trusting the cache key from transformers.

Kinda related to my comment here: #10834 (comment) I'm all in for that 👍

thymikee · 2020-11-17T09:13:00Z

even if the cache key is the same it's a cache miss since Jest will look in different directories for different projects when checking if the cached file exist

Let's make it so the cache directory is the same across the root project. We should be able to do that when getCacheKey is properly calculated based on dependencies of a transformer.

SimenB · 2020-11-17T09:17:43Z

Works for me 👍 Introduces race conditions between projects (in theory alleviated through write-file-atomic) and makes getCacheKey more important, but I think that's fine. If a transformer does not define getCacheKey the default implementation takes the whole config string, which makes sense to me: https://github.com/facebook/jest/blob/a66eec74b55dc98ab072be330c13c54c35cba5f7/packages/jest-transform/src/ScriptTransformer.ts#L113-L119. We could also deprecate falling back to default and say "pls implement one", if nothing else just using the new @jest/create-cache-key-function package

thymikee · 2020-11-17T09:21:38Z

Sounds like a plan. I'd be up for adding a deprecation warning in Jest 27 and keep it for a while until community adapts, which may take a while. Then remove. Btw, why are we not using @jest/create-cache-key-function as a default now? Seems like a good case for some reuse :D

SimenB · 2020-11-17T09:23:07Z

Btw, why are we not using @jest/create-cache-key-function as a default now? Seems like a good case for some reuse :D

Dunno. PR welcome? 😀

lexanth · 2020-11-17T11:33:54Z

One further thought on this - will just removing the this._config.name will result in all jest projects (as in completely separate ones, rather than multiproject) sharing a cache directory? I presume that's where this came from originally.

Edit: They already do.

ahnpnl · 2020-12-18T23:03:21Z

Sounds like a plan. I'd be up for adding a deprecation warning in Jest 27 and keep it for a while until community adapts, which may take a while. Then remove. Btw, why are we not using @jest/create-cache-key-function as a default now? Seems like a good case for some reuse :D

One cent on @jest/create-cache-key-function, I like this package but turned out causing coverage issue that I had to revert to the cache key constructing way of babel-jest. Either I didn’t use it correctly or something wrong with the package.

github-actions · 2022-02-25T12:49:15Z

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions · 2022-03-27T12:58:44Z

This issue was closed because it has been stalled for 7 days with no activity. Please open a new issue if the issue is still relevant, linking to this one.

github-actions · 2022-05-05T00:08:55Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

lexanth added Bug Report Needs Repro Needs Triage labels Nov 16, 2020

SimenB added 🚀 Feature Request and removed Bug Report Needs Repro Needs Triage labels Nov 16, 2020

ahnpnl mentioned this issue Jan 5, 2021

Share cache across multiple projects in monorepo setup. kulshekhar/ts-jest#2250

Closed

ahnpnl mentioned this issue Jan 27, 2021

Slow tests with 9.0.0-next.6 thymikee/jest-preset-angular#757

Closed

github-actions bot added the Stale label Feb 25, 2022

github-actions bot closed this as completed Mar 27, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow more aggressive transform caching in multiproject monorepos #10835

Allow more aggressive transform caching in multiproject monorepos #10835

lexanth commented Nov 16, 2020

SimenB commented Nov 16, 2020

thymikee commented Nov 16, 2020

cpojer commented Nov 16, 2020

lexanth commented Nov 17, 2020

SimenB commented Nov 17, 2020 •

edited

thymikee commented Nov 17, 2020

thymikee commented Nov 17, 2020

SimenB commented Nov 17, 2020

thymikee commented Nov 17, 2020 •

edited

SimenB commented Nov 17, 2020

lexanth commented Nov 17, 2020 •

edited

ahnpnl commented Dec 18, 2020

github-actions bot commented Feb 25, 2022

github-actions bot commented Mar 27, 2022

github-actions bot commented May 5, 2022

Allow more aggressive transform caching in multiproject monorepos #10835

Allow more aggressive transform caching in multiproject monorepos #10835

Comments

lexanth commented Nov 16, 2020

🐛 Bug Report

To Reproduce

Expected behavior

Link to repl or repo (highly encouraged)

envinfo

SimenB commented Nov 16, 2020

thymikee commented Nov 16, 2020

cpojer commented Nov 16, 2020

lexanth commented Nov 17, 2020

SimenB commented Nov 17, 2020 • edited

thymikee commented Nov 17, 2020

thymikee commented Nov 17, 2020

SimenB commented Nov 17, 2020

thymikee commented Nov 17, 2020 • edited

SimenB commented Nov 17, 2020

lexanth commented Nov 17, 2020 • edited

ahnpnl commented Dec 18, 2020

github-actions bot commented Feb 25, 2022

github-actions bot commented Mar 27, 2022

github-actions bot commented May 5, 2022

SimenB commented Nov 17, 2020 •

edited

thymikee commented Nov 17, 2020 •

edited

lexanth commented Nov 17, 2020 •

edited