Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ngcc): performance improvements #38840

Closed
wants to merge 8 commits into from
Closed

Conversation

JoostK
Copy link
Member

@JoostK JoostK commented Sep 14, 2020

See individual commits.

@ngbot ngbot bot added this to the needsTriage milestone Sep 14, 2020
JoostK added a commit to JoostK/angular that referenced this pull request Sep 14, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |
JoostK added a commit to JoostK/angular that referenced this pull request Sep 14, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |
@JoostK JoostK added action: review The PR is still awaiting reviews from at least one requested reviewer and removed state: WIP labels Sep 14, 2020
@JoostK JoostK marked this pull request as ready for review September 14, 2020 15:25
JoostK added a commit to JoostK/ngcc-validation that referenced this pull request Sep 14, 2020
@JoostK
Copy link
Member Author

JoostK commented Sep 14, 2020

I ran this on ngcc-validation on a 8-core/16-thread CPU:

Using 4 workers:

time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
315.90s user / 29.88s system / 454% cpu / 1:16.06 total

Using 8 workers:

time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
423.24s user / 41.97s system / 830% cpu / 56.000 total

Here, the 8 worker scenario is slightly faster, which is somewhat expected for ngcc-validation as it has lots of independent packages, so it is able to feed all workers with work very effectively. The situation with 4 workers is still significantly faster than master currently is using 8 workers:

time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
1083.92s user / 107.51s system / 913% cpu / 2:10.39 total

Additionally, quad-core CPUs with hyper-threading report 8 available CPUs, so we'd completely drown the CPU in 7 ngcc workers. I expect only 4 workers to be typically faster even, as is the case in a smaller setup with just @angular/material (see commit message of b5a8a50), so I think that reducing the number of workers to a maximum of 4 is the right balance.


EDIT:

With the addition of @angular package caching, the figures have improved a bit more:

time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
255.97s user / 27.28s system / 535% cpu / 52.852 total

JoostK added a commit to JoostK/angular that referenced this pull request Sep 14, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |
JoostK added a commit to JoostK/angular that referenced this pull request Sep 14, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |
@JoostK
Copy link
Member Author

JoostK commented Sep 14, 2020

Here's the figures for a bare CLI app with @angular/material added, such that there is a somewhat representative number of entry-points to process:

npx @angular/cli new ngcc-perf
cd ngcc-perf
yarn ng add @angular/material

On master, sync:

time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points --no-async
101.68s user / 6.31s system / 155% cpu / 1:09.60 total

This PR, sync:

time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points --no-async
24.48s user / 1.58s system / 164% cpu / 15.845 total

On master, async (8 workers):

time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points
170.34s user / 12.49s system / 888% cpu / 20.581 total

This PR, async (4 workers):

time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points
40.32s user / 3.23s system / 463% cpu / 9.407 total

Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @JoostK - I don't see any issues with the logic.

Regarding structure, I would extract the module resolution cache from the TransformCache unless I am missing something.

Regarding naming, I would consider SharedFileCache and EntryPointFileCache - which only work if they are not responsible for the module resolution cache.

packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/packages/transform_cache.ts Outdated Show resolved Hide resolved
packages/compiler-cli/ngcc/src/main.ts Outdated Show resolved Hide resolved
@petebacondarwin
Copy link
Member

Oh one more thing regarding the synchronous CLI integration. What about adding the TransformCache to the API that the CLI uses to invoke ngcc. This would allow the CLI to be able to provide a cache where it can control invalidating files, since it would be able to know about that. Then we could share these cached files between synchronous invocations of ngcc in the CLI.

@JoostK
Copy link
Member Author

JoostK commented Sep 15, 2020

Oh one more thing regarding the synchronous CLI integration. What about adding the TransformCache to the API that the CLI uses to invoke ngcc. This would allow the CLI to be able to provide a cache where it can control invalidating files, since it would be able to know about that. Then we could share these cached files between synchronous invocations of ngcc in the CLI.

I would rather not at this point. The configuration is already complex and allowing for external invalidation of caches introduces yet more public API and the possibility for bugs of stale caches. The changes in this PR are purely internal, which gives us infinite freedom in what we deem safe to cache and for how long.

JoostK added a commit to JoostK/angular that referenced this pull request Sep 15, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |
Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d3lm
Copy link

d3lm commented Sep 15, 2020

Very cool! Nice job @JoostK 👍 Love this ❤️

👏 👏 👏

JoostK added a commit to JoostK/angular that referenced this pull request Sep 15, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |

In addition to changing the default number of workers, ngcc will now
use the environment variable `NGCC_MAX_WORKERS` that may be configured
to either reduce or increase the number of workers.
Copy link
Member

@gkalpak gkalpak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! 💯 🚀 👨‍🎤

const maxWorkers = process.env.NGCC_MAX_WORKERS;
if (maxWorkers === undefined) {
// Use up to 4 CPU cores for workers, always reserving one for master.
return Math.max(1, Math.min(4, os.cpus().length - 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: FWIW, I would find it more intuitive to return 0 when there are not enough CPUs for workers (i.e. when os.cpus().length < 2).

Copy link
Member Author

@JoostK JoostK Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting point, especially when considering how to configure NGCC_MAX_WORKERS. Currently I require that to be at least 1, but that doesn't mean that it'll actually spawn a single worker, as it's smart enough to run it in the same process. So from that perspective, returning 0 here if there's too few CPUs feels inconsistent to me (without also changing how we interpret NGCC_MAX_WORKERS). I feel it would be quite awkward to allow NGCC_MAX_WORKERS=0 which would operate identically to NGCC_MAX_WORKERS=1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily agree, but I don't feel strongly about this at all (so it works for me as is) 😅

@gkalpak gkalpak added action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Sep 15, 2020
ngcc creates typically two `ts.Program` instances for each entry-point,
one for processing sources and another one for processing the typings.
The creation of these programs is somewhat expensive, as it concerns
module resolution and parsing of source files.

This commit implements several layers of caching to optimize the
creation of programs:

1. A shared module resolution cache across all entry-points within a
   single invocation of ngcc. Both the sources and typings program
   benefit from this cache.
2. Sharing the parsed `ts.SourceFile` for a single entry-point between
   the sources and typings program.
3. Sharing parsed `ts.SourceFile`s of TypeScript's default libraries
   across all entry-points within a single invocation. Some of these
   default library typings are large and therefore expensive to parse,
   so sharing the parsed source files across all entry-points offers
   a significant performance improvement.

Using a bare CLI app created using `ng new` + `ng add @angular/material`,
the above changes offer a 3-4x improvement in ngcc's processing time
when running synchronously and ~2x improvement for asynchronous runs.
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR angular#38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |

In addition to changing the default number of workers, ngcc will now
use the environment variable `NGCC_MAX_WORKERS` that may be configured
to either reduce or increase the number of workers.
In the integration test suite of ngcc, we load a set of files from
`node_modules` into memory. This includes the `typescript` package and
`@angular` scoped packages, which account for a large number of large
files that needs to be loaded from disk. This commit moves this work
to the top-level, such that it doesn't have to be repeated in all tests.
@JoostK JoostK added action: merge The PR is ready for merge by the caretaker and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews labels Sep 15, 2020
AndrewKushnir pushed a commit that referenced this pull request Sep 15, 2020
)

ngcc creates typically two `ts.Program` instances for each entry-point,
one for processing sources and another one for processing the typings.
The creation of these programs is somewhat expensive, as it concerns
module resolution and parsing of source files.

This commit implements several layers of caching to optimize the
creation of programs:

1. A shared module resolution cache across all entry-points within a
   single invocation of ngcc. Both the sources and typings program
   benefit from this cache.
2. Sharing the parsed `ts.SourceFile` for a single entry-point between
   the sources and typings program.
3. Sharing parsed `ts.SourceFile`s of TypeScript's default libraries
   across all entry-points within a single invocation. Some of these
   default library typings are large and therefore expensive to parse,
   so sharing the parsed source files across all entry-points offers
   a significant performance improvement.

Using a bare CLI app created using `ng new` + `ng add @angular/material`,
the above changes offer a 3-4x improvement in ngcc's processing time
when running synchronously and ~2x improvement for asynchronous runs.

PR Close #38840
AndrewKushnir pushed a commit that referenced this pull request Sep 15, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR #38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |

In addition to changing the default number of workers, ngcc will now
use the environment variable `NGCC_MAX_WORKERS` that may be configured
to either reduce or increase the number of workers.

PR Close #38840
AndrewKushnir pushed a commit that referenced this pull request Sep 15, 2020
In the integration test suite of ngcc, we load a set of files from
`node_modules` into memory. This includes the `typescript` package and
`@angular` scoped packages, which account for a large number of large
files that needs to be loaded from disk. This commit moves this work
to the top-level, such that it doesn't have to be repeated in all tests.

PR Close #38840
AndrewKushnir pushed a commit that referenced this pull request Sep 15, 2020
Recent optimizations to ngcc have significantly reduced the total time
it takes to process `node_modules`, to such extend that sharding across
multiple processes has become less effective. Previously, running
ngcc asynchronously would allow for up to 8 workers to be allocated,
however these workers have to repeat work that could otherwise be shared.
Because ngcc is now able to reuse more shared computations, the overhead
of multiple workers is increased and therefore becomes less effective.
As an additional benefit, having fewer workers requires less memory and
less startup time.

To give an idea, using the following test setup:

```bash
npx @angular/cli new perf-test
cd perf-test
yarn ng add @angular/material
./node_modules/.bin/ngcc --properties es2015 module main \
  --first-only --create-ivy-entry-points
```

We observe the following figures on CI:

|                   | 10.1.1    | PR #38840 |
| ----------------- | --------- | --------- |
| Sync              | 85s       | 25s       |
| Async (8 workers) | 22s       | 16s       |
| Async (4 workers) | -         | 11s       |

In addition to changing the default number of workers, ngcc will now
use the environment variable `NGCC_MAX_WORKERS` that may be configured
to either reduce or increase the number of workers.

PR Close #38840
AndrewKushnir pushed a commit that referenced this pull request Sep 15, 2020
In the integration test suite of ngcc, we load a set of files from
`node_modules` into memory. This includes the `typescript` package and
`@angular` scoped packages, which account for a large number of large
files that needs to be loaded from disk. This commit moves this work
to the top-level, such that it doesn't have to be repeated in all tests.

PR Close #38840
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker area: performance cla: yes target: patch This PR is targeted for the next patch release type: bug/fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants