Sharing `SourceFile`s between `createWatchProgram` calls #35676

bradzacher · 2019-12-14T00:03:29Z

Search Terms

SourceFile, createWatchProgram, DocumentRegistry, WatchOfConfigFile

Background

A bit of background first, over in @typescript-eslint, we use the compiler API to parse files. The user passes us a set of tsconfigs, and we create a program for each of those projects. ESLint then asks us to parse a file, and we take the the pre-parsed AST, transform it to the ESTree AST spec, and return the new AST, as well as the Program. The former ESLint traverses to power all lint rules, the latter our type-aware lint rules use to do statically analysis.

Our parser has to support two use cases:

the "one-and-done" CLI run use case, where every file is parsed and linted exactly once, and then the process exits.
the "never ending story" IDE use case, where each file can be parsed 1..n times, each time with potentially different contents.

Because of the way ESLint is designed^[1], it has to support both of these use cases at the same time, without any information about which case the current run is.

Because of this, we built our parser on top of the createWatchProgram API, because it provides a BuilderProgram instead of a Program, which is more amenable to the incremental updates we get in the IDE use case. We provide a customised WatchCompilerHost, which does not rely on filesystem watchers^[2], and instead manually call the watch callbacks whenever ESLint tells us there's new file content. There's also a bit of code around detecting if a file is new, or if a file has been deleted.

If you're interested, the code is here: https://github.com/typescript-eslint/typescript-eslint/blob/835378e505f462d965ce35cc4c81f8eee1704a30/packages/typescript-estree/src/create-program/createWatchProgram.ts#L227

[1] - ESLint only really has one API - the CLIEngine, which is used in the same way in both their CLI, and 3rd-party IDE extensions, such as vscode-eslint
[2] - We cannot use file watchers due to the CLI use case, which has to exit automatically as soon as the lint run is finished. I tried non-persistent file watchers, but they are are pretty slow, filled with edge cases and are all around a real pain. Additionally file watchers are async, which doesn't work in ESLint's synchronous world.

Problem

So with all that being said, what's the problem?

For multi-package monorepos, a user will pass us a set of tsconfigs - one per package. We then go ahead and call createWatchProgram once per tsconfig. Simple enough, right?

The problem is we're hitting performance issues and OOMs (typescript-eslint/typescript-eslint#1192) for large projects, or just smaller, highly-interdependent projects with many packages.

As far as I can tell^[3], the problem is that each program is independent, and isolated. If package A and package B both depend on package C, then we will have 3 copies of package C in 3 separate programs. For monorepos that are interdependent, this can cause the memory footprint be 2 or more times higher than it should be. For monorepos that share npm dependencies, this can cause a constant increase in the memory footprint.

Of course this double-parsing is also a performance problem - ESLint runs synchronously, so we cannot parallelise, meaning every wasted parse is wasted time.

[3] This is an educated guess based on the fact that parsing n tsconfigs with separate createWatchProgram calls causes an OOM, whereas parsing the same code under a single tsconfig with a single createWatchProgram call does not OOM.
I.e.
This OOMs: n tsconfigs in ./packages/**/tsconfig.json which look like { include: ["src"] }.
This does not OOM: 1 tsconfig in . which looks like: { include: ["packages/**/src"] }

Solutions

I'm not entirely certain the best approach here. I've done a bit of digging around the available APIs, but it doesn't look like there's another good solution for us right now. Which is one of the reasons I'm opening this issue - I was wondering if there's a better API we can use?

If I'm right and none of the other compiler APIs will work for us, then I believe we need to do one of:

Drop lower level, and re-implement createWatchProgram in a way that lets us share memory (by itself, this isn't a terrible idea - we could remove a bunch of hacks if we did this, but I hate having to duplicate all of that work, considering the createWatchProgram stack is some 1600 lines).
Update createWatchProgram, so it supports sharing memory.
Create a new API, similar to createWatchProgram, but which better fits our use case and supports sharing memory.

In terms of sharing memory, I noticed that the Language Service API supports passing a shared DocumentRegistry, so I am curious if this problem is "as simple as" going with option (2), and updating createWatchProgram to accept a DocumentRegistry?

I'm not sure - I'd appreciate any advice from you guys, as you are the experts. I'm happy to dive in an attempt a PR here if that's what's required.

The text was updated successfully, but these errors were encountered:

uniqueiniquity · 2019-12-14T00:16:27Z

cc: @sheetalkamat

RyanCavanaugh · 2019-12-19T23:52:58Z

@sheetalkamat can you offer some advice?

bradzacher · 2020-01-14T00:22:01Z

Not to be annoying, but pinging on this to make sure it didn't get lost (because I opened it over the holiday period).

uniqueiniquity · 2020-01-14T00:27:26Z

Thanks for the ping! @sheetalkamat and I are planning to take a look at this soon.

sheetalkamat · 2020-01-31T20:13:39Z

Need more time to investigate whether the cause is because program outlives the need or because there are multiple programs at a time when one could work with one program at a time. The investigation and changes to api surely don't make 3.8.1 bar and need to be done in next milestone.

sheetalkamat · 2020-03-11T00:17:32Z

Closing this in favor of typescript-eslint/typescript-eslint#1718

bradzacher mentioned this issue Dec 14, 2019

Benchmarking against TSLint typescript-eslint/typescript-eslint#55

Closed

RyanCavanaugh added the Needs Investigation This issue needs a team member to investigate its status. label Dec 19, 2019

RyanCavanaugh assigned sheetalkamat Dec 19, 2019

screendriver mentioned this issue Dec 20, 2019

OOM when many (>10) project configs passed to the parser typescript-eslint/typescript-eslint#1192

Closed

bradzacher mentioned this issue Dec 31, 2019

IDE Parsing error when using overlapping globs typescript-eslint/typescript-eslint#1394

Closed

RyanCavanaugh added this to the TypeScript 3.8.1 milestone Jan 14, 2020

sheetalkamat modified the milestones: TypeScript 3.8.1, TypeScript 3.8.0, TypeScript 3.9.0 Jan 31, 2020

sheetalkamat mentioned this issue Mar 11, 2020

Purge programs once they are done typescript-eslint/typescript-eslint#1718

Open

sheetalkamat closed this as completed Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing `SourceFile`s between `createWatchProgram` calls #35676

Sharing `SourceFile`s between `createWatchProgram` calls #35676

bradzacher commented Dec 14, 2019 •

edited

uniqueiniquity commented Dec 14, 2019

RyanCavanaugh commented Dec 19, 2019

bradzacher commented Jan 14, 2020

uniqueiniquity commented Jan 14, 2020

sheetalkamat commented Jan 31, 2020

sheetalkamat commented Mar 11, 2020

Sharing SourceFiles between createWatchProgram calls #35676

Sharing SourceFiles between createWatchProgram calls #35676

Comments

bradzacher commented Dec 14, 2019 • edited

Search Terms

Background

Problem

Solutions

uniqueiniquity commented Dec 14, 2019

RyanCavanaugh commented Dec 19, 2019

bradzacher commented Jan 14, 2020

uniqueiniquity commented Jan 14, 2020

sheetalkamat commented Jan 31, 2020

sheetalkamat commented Mar 11, 2020

Sharing `SourceFile`s between `createWatchProgram` calls #35676

Sharing `SourceFile`s between `createWatchProgram` calls #35676

bradzacher commented Dec 14, 2019 •

edited