[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

puzpuzpuz · 2020-07-25T18:18:28Z

This experiment attempts to remove unnecessary native-to-JS code switches in fast-path for PromiseHooks introduced in #32891. Those switches happen even if certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario (as ALS uses only init hook).

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

src/async_wrap.cc

addaleax · 2020-07-25T19:02:17Z

Also, not sure if this was inspired by this issue, but I think you can add: Fixes: https://github.com/nodejs/node/issues/34493?

Remove unnecessary native-to-JS code switches in fast-path for PromiseHooks. Those switches happen even if a certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario, i.e. when there is only an init hook.

puzpuzpuz · 2020-07-25T20:18:13Z

Also, not sure if this was inspired by this issue, but I think you can add: Fixes: https://github.com/[/issues/34493](https://github.com/nodejs/node/issues/34493)?

@addaleax
No, I was recently thinking about potential optimizations which could reduce ALS overhead and came to this idea. As for the issue, I haven't seen it before, but thanks for the heads-up. Still, I'm not sure if that's OK to claim that this PR fixes the issue, as it assumes a performance improvement, but it's not very big. WDYT?

puzpuzpuz · 2020-07-25T20:21:45Z

Here are the results of benchmark/async_hooks/promises.js benchmark run on CI (https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/652/):

22:14:35                                                                     confidence improvement accuracy (*)   (**)  (***)
22:14:35  async_hooks/promises.js asyncHooks='disabled' n=1000000                            0.90 %       ±2.32% ±3.10% ±4.05%
22:14:35  async_hooks/promises.js asyncHooks='enabled' n=1000000                     **     -5.27 %       ±3.43% ±4.58% ±5.97%
22:14:35  async_hooks/promises.js asyncHooks='enabledWithDestroy' n=1000000           *     -2.64 %       ±2.54% ±3.38% ±4.40%
22:14:35  async_hooks/promises.js asyncHooks='enabledWithInitOnly' n=1000000        ***      5.10 %       ±2.72% ±3.63% ±4.73%
22:14:35 
22:14:35 Be aware that when doing many comparisons the risk of a false-positive
22:14:35 result increases. In this case there are 4 comparisons, you can thus
22:14:35 expect the following amount of false-positive results:
22:14:35   0.20 false positives, when considering a   5% risk acceptance (*, **, ***),
22:14:35   0.04 false positives, when considering a   1% risk acceptance (**, ***),
22:14:35   0.00 false positives, when considering a 0.1% risk acceptance (***)

As expected, there is an improvement in the init hook only case (~5%), but on the other hand it's not that large. So, I'd like to gather some opinions here.

cc @nodejs/async_hooks

puzpuzpuz · 2020-07-25T20:38:06Z

@addaleax many thanks for the quick review and helpful comments!

puzpuzpuz · 2020-07-26T09:11:28Z

BTW I've just tried to replace FastPromiseHook with the following no-op implementation which does almost nothing:

static void FastPromiseHook(PromiseHookType type, Local<Promise> promise,
                            Local<Value> parent) {
  Local<Context> context = promise->CreationContext();
  Environment* env = Environment::GetCurrent(context);
  if (env == nullptr) return;

  if (type == PromiseHookType::kInit) {
    promise->Set(context, env->async_id_symbol(),
        v8::Number::New(v8::Isolate::GetCurrent(), 42)).FromJust();
  }
}

On master I got the following:

$ ./node benchmark/async_hooks/promises.js
async_hooks/promises.js asyncHooks="enabled" n=1000000: 185,945.55495370456
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 158,913.00117897094
async_hooks/promises.js asyncHooks="enabledWithInit" n=1000000: 182,217.81928900545
async_hooks/promises.js asyncHooks="disabled" n=1000000: 408,466.1162559881

With the dummy PromiseHook:

$ ./node benchmark/async_hooks/promises.js 
async_hooks/promises.js asyncHooks="enabled" n=1000000: 231,443.61998399772
async_hooks/promises.js asyncHooks="enabledWithDestroy" n=1000000: 151,732.07311244417
async_hooks/promises.js asyncHooks="enabledWithInit" n=1000000: 243,183.30819966158
async_hooks/promises.js asyncHooks="disabled" n=1000000: 404,664.95117296715

As you may see, there is a slight improvement (which has to be properly measured in multiple runs, but let's assume it's ~20-25% for now), but the main cost seems to be associated with the PromiseHook being installed in V8.

Are there any optimizations that could be made in V8's PromiseHook implementation? I'm not familiar with V8 internals, so I may be wrong here. Maybe collaborators who are good at V8 can provide any inputs here.

addaleax · 2020-07-26T11:38:36Z

@puzpuzpuz Trying to reproduce your results, during profiling there’s two few things that are noticeable:

About 40 % of the benchmark is spent creating the Error object in the benchmark, because capturing the stack trace is expensive – and an additional 10 % spent throwing the exception
The dummy promise hook is still expensive, but not on the calling V8 side. About 85 % its cost comes from the Object::Set() call, and the rest from the other calls into V8 that we perform.

Generally, reducing the # of calls to Get(), Set() or Call() is a great way to improve performance. That’s certainly not trivial here, though.

puzpuzpuz · 2020-07-26T11:55:05Z

@addaleax
Yes, capturing stack trace is costly. Yet, the benchmark does it in all scenarios, so it's still fair to compare disabled scenario with other ones. If we change the benchmark to plain Promise.resolve(), it should make the overhead only worse.

As for the V8 calls made within the dummy hook, that's a good finding. Unfortunately, I don't see any workarounds that would allow us to avoid Get calls. But I'm glad to hear that the main cost is not on the V8 side. I'll try to do some profiling myself.

Stack trace capturing currently accounts for 40 % of the benchmark running time. Always throwing the same exception object removes that overhead and lets the benchmark be more focused on what it is supposed to measure. Refs: nodejs#34512 (comment)

puzpuzpuz · 2020-07-27T07:44:53Z

@addaleax I'd like to do a follow-up on further PromiseHook optimizations topic. I could create a separate GH issue, but for now I'm fine with an ad-hoc discussion here.

What I'm thinking of is using Object::SetInternalField()/GetInternalField() methods which seems to have much less overhead than the Set()/Get() methods. They are already used by the slow-path PromiseWrap:
https://github.com/nodejs/node/blob/master/src/async_wrap.cc#L230

With this approach we could assign async id on native side and then pass it into the fastPromiseHook JS hook, so that JS code doesn't need to read it itself.

To implement this approach we could expand count of internal fields to 2 instead of 1 as it is now. But I failed to find the exact place where it's set for native promises. Could you point me at the place?

Update. It appears that we should be fine with a single internal field, so please ignore the above question.

nodejs-github-bot · 2020-07-27T10:37:17Z

CI: https://ci.nodejs.org/job/node-test-pull-request/32514/

addaleax · 2020-07-27T14:21:42Z

@puzpuzpuz Yeah, that should work, we just need to be very careful about not accidentally reading the async id and interpreting it as a PromiseWrap* or vice versa.

The place where the internal field count is configured is here:

node/configure.py

Line 1325 in 62198d2

    
           o['variables']['v8_promise_internal_field_count'] = 1 # Add internal field to promises for async hooks.

puzpuzpuz · 2020-07-27T15:06:33Z

@addaleax Yes, fast/slow path hooks interop has to be kept. I'm going to submit a draft experiment PR to follow-up on the idea. And thanks for pointing me at the configuration file.

danielgindi · 2020-07-27T18:26:14Z

I'm trying to understand something here. V8 knows how to generate a stack trace that considers await calls, but the cost is paid when delivered right?
So when we want to query ALS store, can't use the same technique to find the path we went through, and add some additional metadata over the breadcrumbs as we walk through it?

As I noted in the issue, I achieved a much more performant ALS (actually Zone) using a JS only technique that did basically that. It generated a stack trace when querying the Zone.current, looked at ids that were added to the Error stack (by appending them to the wrapped function called with .run), and mapped those ids to the relevant Zone.
I really don't understand why a possible implementation for Zone/ALS should cripple Node by this much?

Flarna · 2020-07-27T20:44:32Z

v8 is able to show the origin of a await (not full long stack traces) but not more (see https://v8.dev/blog/fast-async#improved-developer-experience). If you use plain Promises or callback based APIs there is nothing like for await.

ALS doesn't has this limitation, it follows all async transactions signaled by async hooks - and this comes with a cost.

danielgindi · 2020-07-28T03:57:16Z

v8 is able to show the origin of a await (not full long stack traces) but not more (see https://v8.dev/blog/fast-async#improved-developer-experience). If you use plain Promises or callback based APIs there is nothing like for await.

ALS doesn't has this limitation, it follows all async transactions signaled by async hooks - and this comes with a cost.

That's right. But my Zone implementation filled the rest by wrapping callback functions and patching Promises, much like Angular's polyfill did.
The overall impact was between ~10%-~25%. And it was all userland. JS. It was not 97% impact.

Remove unnecessary native-to-JS code switches in fast-path for PromiseHooks. Those switches happen even if a certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario, i.e. when there is only an init hook. PR-URL: #34512 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: David Carlier <devnexen@gmail.com>

puzpuzpuz · 2020-07-28T07:55:36Z

Landed in 53870dd

puzpuzpuz · 2020-07-28T08:12:16Z

@danielgindi probably it makes sense to continue the conversation in #34493.

danielgindi · 2020-07-28T11:32:44Z

@puzpuzpuz Could you run the benchmark snippet in #34493 on your PR and put the results here?

puzpuzpuz · 2020-07-28T12:03:36Z

@danielgindi sure. Already did that.

Refs: nodejs#34512

Remove unnecessary native-to-JS code switches in fast-path for PromiseHooks. Those switches happen even if a certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario, i.e. when there is only an init hook. PR-URL: #34512 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: David Carlier <devnexen@gmail.com>

Stack trace capturing currently accounts for 40 % of the benchmark running time. Always throwing the same exception object removes that overhead and lets the benchmark be more focused on what it is supposed to measure. Refs: #34512 (comment) PR-URL: #34523 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>

Remove unnecessary native-to-JS code switches in fast-path for PromiseHooks. Those switches happen even if a certain type of hook (say, before) is not installed, which may lead to sub-optimal performance in the AsyncLocalStorage scenario, i.e. when there is only an init hook. PR-URL: #34512 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: David Carlier <devnexen@gmail.com>

Native side of fast-path promise hook was not calling JS fastPromiseHook function when there were no async ids previously assigned to the promise. Because of that already created promises could not get id assigned in situations when an async hook without a before listener function is enabled after their creation. As the result executionAsyncId could return wrong id when called within promise's .then(). Refs: nodejs#34512

Native side of fast-path promise hook was not calling JS fastPromiseHook function when there were no async ids previously assigned to the promise. Because of that already created promises could not get id assigned in situations when an async hook without a before listener function is enabled after their creation. As the result executionAsyncId could return wrong id when called within promise's .then(). Refs: #34512 PR-URL: #34548 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>

Stack trace capturing currently accounts for 40 % of the benchmark running time. Always throwing the same exception object removes that overhead and lets the benchmark be more focused on what it is supposed to measure. Refs: #34512 (comment) PR-URL: #34523 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>

Native side of fast-path promise hook was not calling JS fastPromiseHook function when there were no async ids previously assigned to the promise. Because of that already created promises could not get id assigned in situations when an async hook without a before listener function is enabled after their creation. As the result executionAsyncId could return wrong id when called within promise's .then(). Refs: #34512 PR-URL: #34548 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>

Stack trace capturing currently accounts for 40 % of the benchmark running time. Always throwing the same exception object removes that overhead and lets the benchmark be more focused on what it is supposed to measure. Refs: #34512 (comment) PR-URL: #34523 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Andrey Pechkurov <apechkurov@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jul 25, 2020

puzpuzpuz added async_hooks Issues and PRs related to the async hooks subsystem. performance Issues and PRs related to the performance of Node.js. labels Jul 25, 2020

puzpuzpuz force-pushed the experiment/optimize-promise-hook-for-als branch from ed4df22 to bfe3a68 Compare July 25, 2020 18:37

addaleax approved these changes Jul 25, 2020

View reviewed changes

src/async_wrap.cc Outdated Show resolved Hide resolved

src/async_wrap.cc Outdated Show resolved Hide resolved

puzpuzpuz force-pushed the experiment/optimize-promise-hook-for-als branch from bfe3a68 to 1274aad Compare July 25, 2020 20:11

puzpuzpuz marked this pull request as ready for review July 25, 2020 20:11

benjamingr approved these changes Jul 26, 2020

View reviewed changes

devnexen approved these changes Jul 26, 2020

View reviewed changes

addaleax mentioned this pull request Jul 26, 2020

benchmark: always throw the same Error instance #34523

Closed

3 tasks

puzpuzpuz added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jul 27, 2020

puzpuzpuz closed this Jul 28, 2020

puzpuzpuz deleted the experiment/optimize-promise-hook-for-als branch July 28, 2020 08:10

puzpuzpuz added a commit to puzpuzpuz/node that referenced this pull request Jul 28, 2020

async_hooks: optimize fast-path promise hook for ALS (iter 2)

e937178

Refs: nodejs#34512

ruyadorno mentioned this pull request Jul 28, 2020

v14.7.0 proposal #34542

Merged

puzpuzpuz mentioned this pull request Jul 29, 2020

async_hooks: fix id assignment in fast-path promise hook #34548

Closed

3 tasks

codebytere added the dont-land-on-v12.x label Nov 3, 2020

mhdawson mentioned this pull request Jan 18, 2022

pummel/test-worker-take-heapsnapshot - crash in debug mode on arm #41204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

puzpuzpuz commented Jul 25, 2020 •

edited

addaleax commented Jul 25, 2020

puzpuzpuz commented Jul 25, 2020 •

edited

puzpuzpuz commented Jul 25, 2020

puzpuzpuz commented Jul 25, 2020

puzpuzpuz commented Jul 26, 2020

addaleax commented Jul 26, 2020

puzpuzpuz commented Jul 26, 2020 •

edited

puzpuzpuz commented Jul 27, 2020 •

edited

nodejs-github-bot commented Jul 27, 2020

addaleax commented Jul 27, 2020

puzpuzpuz commented Jul 27, 2020

danielgindi commented Jul 27, 2020

Flarna commented Jul 27, 2020

danielgindi commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

danielgindi commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

[Experiment] async_hooks: optimize fast-path promise hook for ALS #34512

Conversation

puzpuzpuz commented Jul 25, 2020 • edited

Checklist

addaleax commented Jul 25, 2020

puzpuzpuz commented Jul 25, 2020 • edited

puzpuzpuz commented Jul 25, 2020

puzpuzpuz commented Jul 25, 2020

puzpuzpuz commented Jul 26, 2020

addaleax commented Jul 26, 2020

puzpuzpuz commented Jul 26, 2020 • edited

puzpuzpuz commented Jul 27, 2020 • edited

nodejs-github-bot commented Jul 27, 2020

addaleax commented Jul 27, 2020

puzpuzpuz commented Jul 27, 2020

danielgindi commented Jul 27, 2020

Flarna commented Jul 27, 2020

danielgindi commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

danielgindi commented Jul 28, 2020

puzpuzpuz commented Jul 28, 2020

puzpuzpuz commented Jul 25, 2020 •

edited

puzpuzpuz commented Jul 25, 2020 •

edited

puzpuzpuz commented Jul 26, 2020 •

edited

puzpuzpuz commented Jul 27, 2020 •

edited