New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ngcc): performance improvements #38840
Conversation
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s |
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s |
I ran this on ngcc-validation on a 8-core/16-thread CPU: Using 4 workers: time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
315.90s user / 29.88s system / 454% cpu / 1:16.06 total Using 8 workers: time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
423.24s user / 41.97s system / 830% cpu / 56.000 total Here, the 8 worker scenario is slightly faster, which is somewhat expected for ngcc-validation as it has lots of independent packages, so it is able to feed all workers with work very effectively. The situation with 4 workers is still significantly faster than master currently is using 8 workers: time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
1083.92s user / 107.51s system / 913% cpu / 2:10.39 total Additionally, quad-core CPUs with hyper-threading report 8 available CPUs, so we'd completely drown the CPU in 7 ngcc workers. I expect only 4 workers to be typically faster even, as is the case in a smaller setup with just EDIT: With the addition of time ./node_modules/.bin/ngcc --create-ivy-entry-points --error-on-failed-entry-point --first-only --properties es2015 browser module main --no-tsconfig
255.97s user / 27.28s system / 535% cpu / 52.852 total |
packages/compiler-cli/ngcc/src/execution/create_compile_function.ts
Outdated
Show resolved
Hide resolved
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s |
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s |
Here's the figures for a bare CLI app with npx @angular/cli new ngcc-perf
cd ngcc-perf
yarn ng add @angular/material On master, sync: time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points --no-async
101.68s user / 6.31s system / 155% cpu / 1:09.60 total This PR, sync: time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points --no-async
24.48s user / 1.58s system / 164% cpu / 15.845 total On master, async (8 workers): time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points
170.34s user / 12.49s system / 888% cpu / 20.581 total This PR, async (4 workers): time ./node_modules/.bin/ngcc --properties es2015 module main --first-only --create-ivy-entry-points
40.32s user / 3.23s system / 463% cpu / 9.407 total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @JoostK - I don't see any issues with the logic.
Regarding structure, I would extract the module resolution cache from the TransformCache
unless I am missing something.
Regarding naming, I would consider SharedFileCache
and EntryPointFileCache
- which only work if they are not responsible for the module resolution cache.
Oh one more thing regarding the synchronous CLI integration. What about adding the |
I would rather not at this point. The configuration is already complex and allowing for external invalidation of caches introduces yet more public API and the possibility for bugs of stale caches. The changes in this PR are purely internal, which gives us infinite freedom in what we deem safe to cache and for how long. |
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Very cool! Nice job @JoostK 👍 Love this ❤️ 👏 👏 👏 |
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s | In addition to changing the default number of workers, ngcc will now use the environment variable `NGCC_MAX_WORKERS` that may be configured to either reduce or increase the number of workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! 💯 🚀 👨🎤
const maxWorkers = process.env.NGCC_MAX_WORKERS; | ||
if (maxWorkers === undefined) { | ||
// Use up to 4 CPU cores for workers, always reserving one for master. | ||
return Math.max(1, Math.min(4, os.cpus().length - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: FWIW, I would find it more intuitive to return 0 when there are not enough CPUs for workers (i.e. when os.cpus().length < 2
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point, especially when considering how to configure NGCC_MAX_WORKERS
. Currently I require that to be at least 1, but that doesn't mean that it'll actually spawn a single worker, as it's smart enough to run it in the same process. So from that perspective, returning 0 here if there's too few CPUs feels inconsistent to me (without also changing how we interpret NGCC_MAX_WORKERS
). I feel it would be quite awkward to allow NGCC_MAX_WORKERS=0
which would operate identically to NGCC_MAX_WORKERS=1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily agree, but I don't feel strongly about this at all (so it works for me as is) 😅
ngcc creates typically two `ts.Program` instances for each entry-point, one for processing sources and another one for processing the typings. The creation of these programs is somewhat expensive, as it concerns module resolution and parsing of source files. This commit implements several layers of caching to optimize the creation of programs: 1. A shared module resolution cache across all entry-points within a single invocation of ngcc. Both the sources and typings program benefit from this cache. 2. Sharing the parsed `ts.SourceFile` for a single entry-point between the sources and typings program. 3. Sharing parsed `ts.SourceFile`s of TypeScript's default libraries across all entry-points within a single invocation. Some of these default library typings are large and therefore expensive to parse, so sharing the parsed source files across all entry-points offers a significant performance improvement. Using a bare CLI app created using `ng new` + `ng add @angular/material`, the above changes offer a 3-4x improvement in ngcc's processing time when running synchronously and ~2x improvement for asynchronous runs.
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR angular#38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s | In addition to changing the default number of workers, ngcc will now use the environment variable `NGCC_MAX_WORKERS` that may be configured to either reduce or increase the number of workers.
In the integration test suite of ngcc, we load a set of files from `node_modules` into memory. This includes the `typescript` package and `@angular` scoped packages, which account for a large number of large files that needs to be loaded from disk. This commit moves this work to the top-level, such that it doesn't have to be repeated in all tests.
) ngcc creates typically two `ts.Program` instances for each entry-point, one for processing sources and another one for processing the typings. The creation of these programs is somewhat expensive, as it concerns module resolution and parsing of source files. This commit implements several layers of caching to optimize the creation of programs: 1. A shared module resolution cache across all entry-points within a single invocation of ngcc. Both the sources and typings program benefit from this cache. 2. Sharing the parsed `ts.SourceFile` for a single entry-point between the sources and typings program. 3. Sharing parsed `ts.SourceFile`s of TypeScript's default libraries across all entry-points within a single invocation. Some of these default library typings are large and therefore expensive to parse, so sharing the parsed source files across all entry-points offers a significant performance improvement. Using a bare CLI app created using `ng new` + `ng add @angular/material`, the above changes offer a 3-4x improvement in ngcc's processing time when running synchronously and ~2x improvement for asynchronous runs. PR Close #38840
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR #38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s | In addition to changing the default number of workers, ngcc will now use the environment variable `NGCC_MAX_WORKERS` that may be configured to either reduce or increase the number of workers. PR Close #38840
In the integration test suite of ngcc, we load a set of files from `node_modules` into memory. This includes the `typescript` package and `@angular` scoped packages, which account for a large number of large files that needs to be loaded from disk. This commit moves this work to the top-level, such that it doesn't have to be repeated in all tests. PR Close #38840
Recent optimizations to ngcc have significantly reduced the total time it takes to process `node_modules`, to such extend that sharding across multiple processes has become less effective. Previously, running ngcc asynchronously would allow for up to 8 workers to be allocated, however these workers have to repeat work that could otherwise be shared. Because ngcc is now able to reuse more shared computations, the overhead of multiple workers is increased and therefore becomes less effective. As an additional benefit, having fewer workers requires less memory and less startup time. To give an idea, using the following test setup: ```bash npx @angular/cli new perf-test cd perf-test yarn ng add @angular/material ./node_modules/.bin/ngcc --properties es2015 module main \ --first-only --create-ivy-entry-points ``` We observe the following figures on CI: | | 10.1.1 | PR #38840 | | ----------------- | --------- | --------- | | Sync | 85s | 25s | | Async (8 workers) | 22s | 16s | | Async (4 workers) | - | 11s | In addition to changing the default number of workers, ngcc will now use the environment variable `NGCC_MAX_WORKERS` that may be configured to either reduce or increase the number of workers. PR Close #38840
In the integration test suite of ngcc, we load a set of files from `node_modules` into memory. This includes the `typescript` package and `@angular` scoped packages, which account for a large number of large files that needs to be loaded from disk. This commit moves this work to the top-level, such that it doesn't have to be repeated in all tests. PR Close #38840
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
See individual commits.