Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Leverage --incremental #1310

Open
yacinehmito opened this issue Dec 8, 2019 · 27 comments
Open

RFC: Leverage --incremental #1310

yacinehmito opened this issue Dec 8, 2019 · 27 comments

Comments

@yacinehmito
Copy link

yacinehmito commented Dec 8, 2019

This is a feature request.

Issue

Our team is working on a large codebase and we are currently having issues with the time our CI takes. We are looking at all the ways to make the build faster.
One of the bottlenecks is the time it takes to run the tests. We investigated and most of time taken to run unit tests is not to actually run the tests but to compile the TS files on-the-fly.

Expected behavior

The thing is that TS files have already been compiled as part of the build step. We would love for ts-jest to leverage this and rely on them.

One idea is to have ts-jest write the compiled files on disk and leverage the incremental option. As ts-jest attempts to recompile the code, tsc will see that most of the work was already done.

We understand this is a large undertaking and I am ready to help! From a quick glance I already identified ts-jest performs additional transformation to hoist jest.mock() and that could be a challenge.

Alternatives

We found none of the speed-up alternatives to be satisfying. They are:

  • Compile the test files ahead of time. It works great but we lose snapshots and it's a poor developer experience. Not a good fit.
  • Compile with babel instead of ts-jest. It's still slow, it doesn't support all of TypeScript features, the TDD experience is not as good and the code that is tested in not the one that actually runs in production. Not a good fit.

We would largely prefer to stick with ts-jest but performance improvements are really necessary for us. Again, I'm ready to work on it.

What do you think about this feature, and would you support a PR introducing it?

@ahnpnl
Copy link
Collaborator

ahnpnl commented Dec 8, 2019

Hi,

We are happy to get community's help in improving ts-jest. Right now we are lack of capacity to support all needs from the community. Therefore feel free to help us by your PRs, we really appreciate.

@yacinehmito
Copy link
Author

yacinehmito commented Dec 29, 2019

I attempted to work on this but currently ts-jest assumes that all compilation happens in memory and hands off caching to Jest. Leveraging --incremental goes against that.
It's an assumption that is relied on throughout the code so I couldn't refactor easily.

I'll try to do an incremental-only Jest transformer first and see how it behaves. After that we may consider introducing the feature to ts-jest but that will be a large undertaking.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Dec 29, 2019

Have you checked ts-jest processing doc ? Does it help to bring up some ideas for your work ?

@yacinehmito
Copy link
Author

yacinehmito commented Jan 3, 2020

I did after you told me about this. Thank you for the info.

It confirms the level of coupling. I did a PoC of a runner with support for --incremental and it turned out to be a bit slower than ts-jest.
With ts-jest, what we lose in speed by not using --incremental, we gained thanks to its aggressive caching.

The issue is that it's this very caching that makes us stop using ts-jest. We have quite a large TypeScript project (almost 300k lines of code) and ts-jest completely chokes on it. Even with a v8 set at 4GB of RAM it blows up. The culprit is the in-memory cache. We are relying on our PoC instead.

The PoC is less smart than ts-jest and a bit more hacky: it's a runner combined with a transformer.
The runner first compiles the project on disk with --incremental. Because it's a runner it happens only once; there are no concurrent compilation.
Then the transformer, instead of compiling the TS files, will just read the compiled JS file from disk, so it's quite fast.

In the end we ended up with something that is just slightly slower than ts-jest but simpler to reason about and with no pressure on the RAM.


The biggest challenge for the introduction of incremental in ts-jest is the TypeScript transformer to hoist calls of jest.mock. If we implement it naïvely, we might end up with running test files that were already on disk because they didn't build as part of a jest run, hence not having the hoisted jest.mock calls.


Takeaway: I don't think I'll be working on that feature for now. Sorry.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Jan 6, 2020

Hmm interesting from your explanation. When ts-jest caching was written, there wasn’t incremental build at that point so I can understand why the choice is cached compilered stuffs on RAM.
I wonder if ts-jest somehow can reuse cache from tsBuildInfo instead of using current caching mechanism. Is it the way you tried with your POC ?

@yacinehmito
Copy link
Author

yacinehmito commented Jan 6, 2020

Absolutely. It is however architecturally very different from ts-jest. I don't know how to transpose it to what ts-jest does.


Unrelated ramble about what I did:

I've been having a mitigated success scaling the PoC though. TypeScript can't emit a file with its dependencies. It means that I need to either emit all files before starting to run the tests, or build the dependency graph myself and emit the files one by one on demand. I'd rather not do that.

It's also more flaky than ts-jest because it analyses the dependencies statically; if there's a dynamic require it will be blind to it. That could be solved by having some sort of communication from the worker to the runner but it's not part of the API of jest-worker.

I also don't like the fact that I have to use a transformer. It means that the runtime first reads the TS file only to be told to read the JS file. I'd rather solve that at the resolver level but I didn't manage to implement a custom resolver for the runtime.

Despite all that, and without any sort of caching, on a large codebase it's just a bit slower than ts-jest in terms of speed (whether it's a hot or cold run). However in our case, because we compile first in our CI, we have a cold run with ts-jest but a hot run with the PoC.

I think there is tremendous opportunity lying here, but we'll have to wait a bit because, at the moment, Jest's and TypeScript's models seem to be fundamentally incompatible. The strategy of ts-jest, which is to transpile files on demand and aggressively cache everything, will stay the best one for almost all projects.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Jan 6, 2020

Despite all that, and without any sort of caching, on a large codebase it's just a bit slower than ts-jest in terms of speed (whether it's a hot or cold run). However in our case, because we compile first in our CI, we have a cold run with ts-jest but a hot run with the PoC.

Is this you used --incremental for tsc and then use jest to run on compiled js files ? If that is the case, I can understand why jest and typescript's models are incompatible like your explanation below.

I think there is tremendous opportunity lying here, but we'll have to wait a bit because, at the moment, Jest's and TypeScript's models seem to be fundamentally incompatible. The strategy of ts-jest, which is to transpile files on demand and aggressively cache everything, will stay the best one for almost all projects.

I agreed

@yacinehmito
Copy link
Author

Is this you used --incremental for tsc and then use jest to run on compiled js files ?

Almost. When you do that you lose the watcher and snapshots are improperly located. Here I have a runner that starts a TypeScript program, emits files with your hoist-jest TS transformer, and injects a custom transformer to the following tests. This transformer, when asked to transform the TS file now returns the JS file compiled on disk.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Jan 6, 2020

FYI, perhaps later we can learn from Angular CLI regarding to this topic.

@yacinehmito I managed somehow to use create incremental program for ts-jest but I'm not sure if I did it correctly. If you have time, would you please help me to check my experiment ? You can find it on this branch. I made a folder called compiler and you can look into program.ts.

To run the experiment codes, ts-jest config needs to have

"experimental": true,
"compilerHost": true

and tsconfig needs to have incremental: true

@maxime1992
Copy link

@ahnpnl your branch link seems to be dead.

Could you also provide explanations if you restore it of how to test it on our own project?

I'd be willing to try 👍

Thanks

@ahnpnl
Copy link
Collaborator

ahnpnl commented Mar 17, 2020

hi @maxime1992 , the changes are in master now. You can test against master.

To use Program with incremental

globals: {
   'ts-jest': {
       compilerHost: true,
   }
}

To use Program without incremental

globals: {
   'ts-jest': {
       compilerHost: true,
       incremental: false,
   }
}

I would expect the performance of Program without incremental slower than with incremental. In general, the current implementation of Program is slower than LanguageService. There are still rooms to improve.

@Alexandr-Ilyin
Copy link

Please provide some information what is the current status of the feature ?
Seems like adding
"ts-jest": {
"compilerHost": true,
"incremental": true
}
with jest --watch still does not use incremental ts build

@kirillgroshkov
Copy link
Contributor

Please provide some information what is the current status of the feature ?
Seems like adding
"ts-jest": {
"compilerHost": true,
"incremental": true
}
with jest --watch still does not use incremental ts build

I double that. Tried exactly these settings and didn't notice any difference. What are the prerequisites for this to work? Should my tsconfig.json have incremental enabled too?

@ahnpnl
Copy link
Collaborator

ahnpnl commented Apr 4, 2020

it is still experimental feature. Right now we only invoke createIncrementalProgram to create a ts Program for compiling and type checking. tsconfig.json doesn't require to have incremental enabled in this case.

More help is welcome because this is relative new to implement and not many docs to follow.

@ahnpnl ahnpnl added the 🚀 Feature Request new suggested feature label Apr 4, 2020
@ahnpnl
Copy link
Collaborator

ahnpnl commented Apr 4, 2020

Expected behavior

The thing is that TS files have already been compiled as part of the build step. We would love for ts-jest to leverage this and rely on them.

One idea is to have ts-jest write the compiled files on disk and leverage the incremental option. As ts-jest attempts to recompile the code, tsc will see that most of the work was already done.

From what I understand from @yacinehmito that:

  • ts-jest uses incremental program API (this has been done in v25.3.0)
  • write compiled files to build folder together with .tsbuildinfo. This will allow tsc to run faster. (this is not implemented yet).
  • somehow make ts-jest reuses this .tsbuildinfo for the next compilation (I think)

I don't know anything else which is needed to do.

@yacinehmito
Copy link
Author

yacinehmito commented Apr 4, 2020

Absolutely. Instead of using the caching capabilities offered by Jest, ts-jest could solely rely on incremental and write everything on disk, in a folder visible by the user. However this is completely different than what is currently being done with ts-jest; I attempted an implementation but it was too challenging for me.

Also, Jest's architecture is not suited for this.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Apr 4, 2020

Absolutely. Instead of using the caching capabilities offered by Jest, ts-jest could solely rely on incremental and write everything on disk, in a folder visible by the user. However this is completely different than what is currently being done with ts-jest; I attempted an implementation but it was too challenging for me.

Also, Jest's architecture is not suited for this.

Yes i think Jest's architecture isn't ready for using custom caching capabilities (in this case we want to use cache which is written by ts compiler).

At the moment, getCacheKey is the 1st point to determine whether a file has changed or not. Jest uses this method to look into its cache, get out the content and compare with new content. If the new content is different, Jest will invoke process which then will invoke process in transformer and the file will be recompiled.

I think what we can do now is separating these 2 caches. Leave Jest do what it is doing and somehow make internal ts compiler of ts-jest to use incremental. However, I don't know what we should do for this. Any thoughts ?

@ahnpnl ahnpnl pinned this issue Apr 4, 2020
@ahnpnl ahnpnl changed the title Leverage --incremental RFC: Leverage --incremental Apr 4, 2020
@yacinehmito
Copy link
Author

I think this is a good direction to take, but I don't know how to make it work.
The transformer would need to have access to the already instantiated program, and I don't know how that could be done.

Then, even if it were the case, I don't know how this would work out when tests run in parallel. Indeed:

  • How does TypeScript behave when parallel processes perform incremental builds?
  • Assuming that parallel incremental build are not desirable, how can we have IPC between the transformer and the process that holds the TypeScript program? (and more importantly, how can the transformer wait, as it's API is synchronous)

This doesn't even touch on the problems on TypeScript's side: the emit API is all or nothing. We either emit all files or one specific file, but can't emit a file and its dependencies. So, emitting as-needed seems to be impossible.

All those problems made me conclude that efficient compilation with Jest is a pipedream, at least until both of these hold true:

  • TypeScript exposes an API suited to the use-case of emitting a file and its dependencies, with incremental support
  • Jest can offload transformation to another process and not necessarily assume that it happens in memory, in the test worker

@ahnpnl
Copy link
Collaborator

ahnpnl commented Apr 10, 2020

I think this is a good direction to take, but I don't know how to make it work.
The transformer would need to have access to the already instantiated program, and I don't know how that could be done.

I think this is doable. This point is starting point of creating Transformer instance, I think we can create Program instance before that point actually executes.

Then, even if it were the case, I don't know how this would work out when tests run in parallel. Indeed:

  • How does TypeScript behave when parallel processes perform incremental builds?
  • Assuming that parallel incremental build are not desirable, how can we have IPC between the transformer and the process that holds the TypeScript program? (and more importantly, how can the transformer wait, as it's API is synchronous)
  • I think for tests run in parallel, we need to share Program via nodejs global process object. However, the fact that we need async transform so it is related to feat: pass ESM options to transformers jestjs/jest#9597 and will take some times before it lands.

  • I have no idea how TypeScript behave with parallel processes perform incremental builds

This doesn't even touch on the problems on TypeScript's side: the emit API is all or nothing. We either emit all files or one specific file, but can't emit a file and its dependencies. So, emitting as-needed seems to be impossible.

All those problems made me conclude that efficient compilation with Jest is a pipedream, at least until both of these hold true:

  • TypeScript exposes an API suited to the use-case of emitting a file and its dependencies, with incremental support
  • Jest can offload transformation to another process and not necessarily assume that it happens in memory, in the test worker
  • To be able to have jest offload transformation to another process, we do need transformer to support async imo.

  • Ya I agree we need an API like that from typescript.

@yacinehmito
Copy link
Author

yacinehmito commented Apr 10, 2020

we need to share Program via nodejs global process object.

When tests run in parallel, transformation seems to happens in each Jest worker, which are different processes. This is why they don't share the global process object.

Ya I agree we need an API like that from typescript.

There's actually a way to do it without this, but we'd need a custom resolver that captures the required path and attempts to emit as a consequence. However I didn't manage to get it right. I made a PoC with a "best attempt" strategy (that doesn't work in all cases, but enough to test it) and performance was not great. There seems to be a significant overhead when calling emit file per file.

@ahnpnl
Copy link
Collaborator

ahnpnl commented Apr 23, 2020

I have looked into ts-loader to replicate their resolvers. However, it doesn’t say anywhere in ts-loader doc about how it works. Do you mind sharing your POC codes ?

@yacinehmito
Copy link
Author

Do you mind sharing your POC codes ?

It's in a private repo owned by Spendesk. I'll see if I can get that out next week.

@ahnpnl ahnpnl unpinned this issue Jun 4, 2020
@ahnpnl
Copy link
Collaborator

ahnpnl commented Jul 13, 2020

Hi @yacinehmito , #1784 introduces the usage of TypeScript resolveModuleNames which also enables the possibility to have custom typescript module resolution.

I think with this API, now it should be possible to achieve what we have been discussed here. However, the implementation is probably limited:

  • ts-jest might not (or better should not) take care of build process but only attempt to get precompiled files from build directory. If build directory doesn't contain the precompiled files, ts-jest will do the compilation.

  • Since jest is working on supporting ESM, this means the precompiled files in build directory should be commonjs format for now.

@yacinehmito
Copy link
Author

yacinehmito commented Jul 13, 2020

Hi @ahnpnl. I just invited you to a private repository where I isolated the code of our custom Jest runner. It is not much, but perhaps will it help.

#1784 introduces the usage of TypeScript resolveModuleNames which also enables the possibility to have custom typescript module resolution.

That's not the part I found difficult. It might be better than some handcrafted reimplementation of the resolution algorithm for forward compatibility and to sort out edge cases, but it was not a fundamental blocker.

ts-jest might not (or better should not) take care of build process but only attempt to get precompiled files from build directory. If build directory doesn't contain the precompiled files, ts-jest will do the compilation.

This is exactly what I am trying to accomplish. Or rather say that the transformers should not attempt compilation, as they are synchronous and parallel. The runner (which doesn't exist in the case of ts-jest but could be introduced), being asynchronous and unique, could compile. Or not.

This is the bit I am struggling with. It's not hard to right a runner that compiles TypeScript code, but it's very hard to write one that works well with Jest's runner.

I tried to run both the Jest watcher and the TypeScript watcher. It "works", and by this I mean that it runs. The problem is that I am not aware of an API in TypeScript to know when it's done compiling, nor do I know one for Jest to know when it's ok for it to run the tests again. So what usually happens is that Jest will test the old version of the code. Sometimes.

Then, when trying to substitute one watcher for the other, I realised that it was a bit futile: what matters is knowing when something has been compiled, then propagate that to Jest. So essentially, the pipeline would have to look like this:

  1. TS file changes
  2. tsc notices and starts compiling
  3. tsc is done compiling; the JS file is emitted
  4. We trick Jest into thinking that only now the TS file changed
  5. Jest runs the appropriate tests

And even if by some miracle we manage to have this working, we then end up with an orchestration problem: what happens if another file changes while the tests are running and this affects the outcome? We'd have to find a way to either cancel tests, or pause the emission of JS files.

All that to say that I don't think integrating Jest and TypeScript at that level is a good approach. It's ideal, but not sensible if we look at the respective implementations.

EDIT: Forgot to mention that I changed the approach for the PoC since my last comment. I went with something simpler as you suggested.

Is this you used --incremental for tsc and then use jest to run on compiled js files ? If that is the case, I can understand why jest and typescript's models are incompatible like your explanation below.

That's what the PoC does right now. (The code is in the private repo I have invited you on, and I might share that to the rest of the world once I get around to asking permission)

@ahnpnl
Copy link
Collaborator

ahnpnl commented Jul 13, 2020

From TypeScript APIs, everything about compiler is synchronous. There have been some discussions about making it async but not happen in the near future.

The pipeline at point 4 is not optimal because this is the task of the transformer. Since there is no real connection between runner and transformer, so I understand it’s hard to let transformer know when runner finishes compiling.

The possible approach for now IMO is:

Runner

  • Runner can possibly do compilation but shouldn’t interfere with jest logic related to file change. Besides, not guarantee that transformers can reuse on time the compiled outputs from runner though because there is no connection between runner and transformer.

  • With the concern above, watch mode will be an issue. However, I think that might be resolved by using watch plugin. Once the Promise from watch plugin solves, at that time ts-jest will start checking build directory. At this point, it is more assured that new precompiled outputs have been updated.

  • Or maybe runner shouldn’t do compilation in watch mode but leave it up to watch plugin ?

  • In non-watch mode, of course one needs to run tsc first before running tests.

Transformer

  • Additional feature to ts-jest to check on build directory to get precompiled output from build directory with a fallback to compile when files don’t exist in build directory or build directory doesn’t exist. Build directory can be either created by runner or a separate tsc process. This means incremental is not needed for ts-jest. The concern here is like explanation in the runner part above.

  • To be able to tell jest that precompiled output have been updated, the value returns for getCacheKey needs to be changed. Right now it is a combination of file path, file content (delivered by jest) + serialized configs. Maybe adding something related to build directory like .tsbuildinfo ?

  • Regarding to memorized stuffs in ts-jest, not sure can do much about it, probably still can improve something but too little to make a good impact on reducing memory usage.

What do you think ? Does this proposal miss anything ?

@jjangga0214
Copy link
Contributor

@yacinehmito @ahnpnl
Hi!
May I ask if there's really no update or solution until 2022?
Thank!

@yacinehmito
Copy link
Author

I did not spend more time thinking about this issue, and I am not using Jest anymore, so I will not be able to help. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants