[async hooks] Criteria for exiting experimental #194

ofrobots · 2018-05-16T17:38:56Z

At a previous WG meeting we discussed the task (assigned to me) of figuring out the criteria for async_hooks to exit experimental. While #124 can continue to be the tracking issue keeping track of the concrete work that needs to happen, I am opening this issue as a discussion what being stable entails.

To become stable, the API must be well understood, well specified and well tested – but those are subjective.

At previous WG meetings, in Async-context formalization and diagnostics support #107 and in Discuss blocking on Zones proposal TSC#340 (comment), @mike-kaufman and @mrkmarron have expressed the need (albeit not directly) for more formally defined semantics – perhaps at the language level – rather than the semantics being defined by implementation. I share this sentiment. We have gotten this concept wrong before, an it is a foundational concept for the language that needs some more rigorous treatment rather than being implementation defined. @mike-kaufman, @mrkmarron: LMK if I am inferring incorrectly.
Similarly, the V8 team (e.g. @bmeurer) has expressed concerns about semantics being inadequately specified which is a barrier to VM being able to optimize safely. A recent example of this is the recent regression in Node 10 where async_hooks behavior changed in a way to break existing use-cases in a major way. The regression was a result of optimization work V8 did in order to improve async await performance. The optimizations V8 did were reasonable as per the spec. However, since Promise Hooks is neither well specified, nor well tested it is hard to know what behaviors are observable part of the Promise Hook contract and what aspects are ancillary. What all can the VM optimize? IMO, at a minimum the Promise Hooks API needs to be better specified (as if it was a language spec – even if it is not) so that we can have adequate levels of fuzz testing in V8 and a strong contract between Node and V8 on the semantics.
Well-understood semantics: I am still making discoveries about async-hooks behavior that are surprising to me. For example, recently I learned that for certain kinds of promises the resolve hook will be called multiple times – what kind of promises this applies to left as an exercise for the reader. As a group we need to decide whether the semantics have gotten enough vetting for us to be comfortable calling it stable.
Performance: AsyncHooks (esp. PromiseHooks), when enabled have a fair amount of performance impact and there are anecdotal reports from both extremes. We have not concluded Collect data about PromiseHooks performance #144 with data from the real world – although I know people are working on getting this data. More abstractly, what is acceptable level of performance impact? Should this even be part of the exit criteria?
API Stability: To improve performance, we may need to change API as a result of the changes suggested in Proposal for Promise hooks to improve performance. #188. There is also a branch that @mcollina is working on that adds currentResource as a parameter to each callback. It is not clear whether we are API stable at this point.

mike-kaufman · 2018-05-16T17:58:29Z

I'm in strong agreement w/ first three bullet points. My gut says if that gets defined right, it obviates AsyncHooks & PromiseHooks (happy to be wrong about this...).

bmeurer · 2018-05-16T19:01:29Z

Thanks for kicking this off @ofrobots. On the V8 side we did revert a whole bunch of recent changes to get back to working state for Node 10. @MayaLekova already started working on adding more test coverage for async_hooks and making PromiseHooks fuzzable on the V8 side.

We had been discussing a plan with @hashseed and @mcollina two weeks ago that will allow us to close the gap performance wise, and we should definitely follow up on that during the summit (@hashseed, @MayaLekova and me will be there from V8 side).

mcollina · 2018-05-16T19:48:56Z

Some notes:

I think our current model is lacking something: a "join/randevouz" operator. This is critical for the correct modelling of async/await functions and to maintain the causality relationship. I have a couple of bubbleprof visualizations to show this. Happy to demo it at the summit.
my branch add async_hooks.currentResource() and it works similarly to how zones would work. It does not change any our external APIs, it is just a new semver-minor addition. I am waiting to open a PR because I would like async_hooks to be fully working before adding more functionality/changing things there. The code in the branch is ready from my point of view.

mrkmarron · 2018-05-17T05:37:13Z

Thanks for outlining these issues so clearly @ofrobots. The first bullet point captures my thoughts on the semantics issue very well and I am in complete agreement with all of the other bullet points as well.

Flarna · 2018-11-20T07:40:10Z

I think one major point is also documentation:

I think it's needed to clearly specify which objects are passed as resource for the various async resource types - or document that relying on specifics of these objects is undefined behavior. Maybe just specify the characteristics users can rely on - or even an API to interact with the resource to e.g. store some context,... instead of forcing users to decide how to do that.
We should consider to reduce the binding of public visible resource types to internal implementation details. e.g. for Node 11 FSREQWRAP has been renamed to FSReqCallback; TIMERWRAP has been remove - both internal changes now visible in public API. Otherwise I fear this may end up in a maintenance burden.

github-actions · 2020-07-17T00:37:25Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mmarchini · 2020-07-20T23:46:23Z

I believe we need to evaluate which issues are still open for async_hooks, as we have some long and last-standing issues which are touching similar topics.

boutell · 2022-03-02T18:53:25Z

Hmm, time has passed... should any part of async_hooks be regarded as safe for use in production at this point?

I'm wondering what the official situation is because OpenTelemetry depends rather heavily on it (e.g. the mongodb wrapper, express wrapper, etc. are fundamentally based on it) and it is very much intended for production use.

Qard · 2022-03-02T19:43:37Z

AsyncResource and AsyncLocalStorage are marked stable already. But async_hooks is not likely to ever be considered stable due to too much leaking of internals and an API with too many edge cases and confusing usage. In fact, there's ongoing effort to satisfy the use cases for it in other ways so it can hopefully eventually be deprecated or marked as legacy.

mcollina · 2022-03-02T19:46:52Z

@Qard I think we could just mark async_hooks as "internals" but keep exposing them.

Qard · 2022-03-02T19:59:32Z

I think that's okay as a temporary solution, but there are still security implications of the API which make me lean toward at least long-term planning for its removal. That of course assumes though that the existing use cases have been covered by other APIs and users have already migrated. I don't expect it to go away any time soon, just hopefully eventually.

boutell · 2022-03-02T20:38:40Z

If AsyncLocalStorage is stable, then I assume async_hooks would remain available at least as a module from which to import AsyncLocalStorage, even if createHook someday goes away.

mcollina · 2022-03-02T21:45:26Z

I think that's okay as a temporary solution, but there are still security implications of the API which make me lean toward at least long-term planning for its removal. That of course assumes though that the existing use cases have been covered by other APIs and users have already migrated. I don't expect it to go away any time soon, just hopefully eventually.

Would you mind to bring this idea to core?

Qard · 2022-03-03T07:53:24Z

It's already been discussed a bunch in the diagnostics working group. There's an issue on the working group repo to gather use cases in hopes that we can produce more purpose-focused solutions which can be better optimized by reducing the scope and avoiding exposing so much surface area of internal behaviour. It's very difficult to change async_hooks without breaking changes because there's so many tiny details to things like timing, interactions with other APIs, many edge cases to handle, etc.

Not sure what more you mean by bringing it to core?

mcollina · 2022-03-03T09:19:57Z

async_hooks is right now marked as a public API. I would just either mark them as deprecated or "internal" in docs.

Flarna · 2022-03-03T09:50:56Z

Which API stability requirements do we have in node for deprecated and internal APIs?
Experimental allows breaking changes even on patch releases. I guess this is not allowed for deprecated APIs (like domain) but maybe for internal.

vmarchaud · 2022-03-03T11:14:08Z

Hmm, time has passed... should any part of async_hooks be regarded as safe for use in production at this point?

Note that the async hooks context manager in OTEL contains both async_hooks and asynclocalstorage versions, which we choose based on the availability of AsyncLocalStorage. Put simply if you use node 14.8 and above and don't manually set the context manager, you are using AsyncLocalStorage (which is stable): https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-sdk-trace-node/src/NodeTracerProvider.ts#L61

mcollina · 2022-03-03T14:02:55Z

Which API stability requirements do we have in node for deprecated and internal APIs?
Experimental allows breaking changes even on patch releases. I guess this is not allowed for deprecated APIs (like domain) but maybe for internal.

There are no guarantees for internal APIs, however some of them are essentially stable due to CITGM checks

github-actions · 2022-08-04T00:26:47Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mirek · 2022-11-04T11:54:08Z

Does anybody know, roughly when async_hooks may exit experimental?

Qard · 2022-11-04T15:30:26Z

Probably never. It exposes too much internals and it's a confusing API. We're pushing for making APIs that serve the use cases for it in cleaner ways. What's your use case for it?

mirek · 2022-11-04T15:37:25Z

Thanks for info. I want to set value at entrypoint of async context and access it at arbitrary depth, ie. set it in rpc handler and access it from nested sql executor - implicitly, without passing this value as explicit context/function parameter everywhere.

bwhitty · 2022-11-04T17:48:07Z

We have a similar critical monitoring and logging use case: setting a “request ID” (among other values) per-HTTP request going through an Express server. We then need to be able to pull that ID from any module executing within that request’s asynchronous context without having to pass the ID/request context around explicitly.

Qard · 2022-11-04T19:08:42Z

Have you tried AsyncLocalStorage for that? Should serve that purpose much better.

GeoffreyBooth · 2022-11-04T19:12:13Z

I think it’s time to at least update the docs to add a note next to the async_hooks API saying that at some point we intend to make it an internal API, no longer available without --expose-internals; and that users should instead migrate to AsyncLocalStorage and the Diagnostics Channel. (And any other applicable replacements.)

Flarna · 2022-11-05T16:42:17Z

I agree that docs should be updated to point users to AsyncLocalStorage. I doubt diagnostics channel as a replacement for async_hooks users, maybe in some rare case.
I don't agree with the --expose-internals statement because there are still usecase where above two are not replacements (see #437). Using --expose-internals has by far more sideeffects and we should not end up in using this on default for quite some usecases.

GeoffreyBooth · 2022-11-05T18:21:48Z

Sure, we can say that it's becoming internal without mentioning --expose-internals. Or we could leave out the detail about it continuing to exist internally and just say that we intend to remove it. Whatever language people think is best.

Re Diagnostics Channel, I feel like a lot of things people are using async_hooks for, like tracking network requests, could be tracked at a higher level via Diagnostics Channel events. I thought that was why we were creating that API. If there are other replacements we should mention those as well. The replacements don't need to be as directly equivalent as AsyncLocalStorage; anything that could provide an alternate way to achieve a use case could be mentioned.

Qard · 2022-11-05T23:53:57Z

On it's own, diagnostics_channel is not sufficient to replace async_hooks. But in combination with AsyncLocalStorage it can replace most tracing-related use cases. There are other use cases though which are not covered by a more purpose-built API. I would argue those are not common enough to warrant blocking async_hooks from moving to internal though, with a proper deprecation cycle of course. There needs to be a discussion among APMs though first to validate that our needs are solved, and if not, to identify what additions are needed to satisfy those needs.

GeoffreyBooth · 2022-11-06T03:32:53Z

There needs to be a discussion among APMs though first to validate that our needs are solved, and if not, to identify what additions are needed to satisfy those needs.

cc @bengl

bwhitty · 2022-11-11T22:33:51Z

Have you tried AsyncLocalStorage for that? Should serve that purpose much better.

@Qard in fact the internal module we use async_hooks for is literally called async-local-storage. I'll note that we can/should migrate off the manual usage of async_hooks to achieve this functionality and just use the purpose-built official module. Our module's main API is a .runInContext(fn) method, which AsyncLocalStorage.run(store, fn) appears to do precisely.

Thanks!

Qard · 2022-11-11T22:36:37Z

Yes, I would definitely encourage you to give AsyncLocalStorage a try. It's designed quite specifically for that async context storage use case, so it should serve your needs well and is also better optimized than most manual async_hooks used are as it doesn't need the expensive destroy hook. 🙂

github-actions bot added the stale label Jul 17, 2020

mmarchini added never stale triaging-requested Issues that need extra attention before deciding if they should be closed or merged together and removed stale labels Jul 20, 2020

Ijmir mentioned this issue Aug 18, 2021

production maturity mondaycom/node-execution-context#4

Open

boutell mentioned this issue Mar 2, 2022

Is context-async-hooks suitable for production use? open-telemetry/opentelemetry-js#2812

Closed

2 tasks

github-actions bot added the stale label Aug 4, 2022

GeoffreyBooth removed the stale label Aug 4, 2022

GeoffreyBooth mentioned this issue Nov 6, 2022

doc: add async_hooks migration note nodejs/node#45335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[async hooks] Criteria for exiting experimental #194

[async hooks] Criteria for exiting experimental #194

ofrobots commented May 16, 2018

mike-kaufman commented May 16, 2018

bmeurer commented May 16, 2018

mcollina commented May 16, 2018

mrkmarron commented May 17, 2018

Flarna commented Nov 20, 2018

github-actions bot commented Jul 17, 2020

mmarchini commented Jul 20, 2020

boutell commented Mar 2, 2022

Qard commented Mar 2, 2022

mcollina commented Mar 2, 2022

Qard commented Mar 2, 2022

boutell commented Mar 2, 2022

mcollina commented Mar 2, 2022

Qard commented Mar 3, 2022

mcollina commented Mar 3, 2022

Flarna commented Mar 3, 2022

vmarchaud commented Mar 3, 2022

mcollina commented Mar 3, 2022

github-actions bot commented Aug 4, 2022

mirek commented Nov 4, 2022

Qard commented Nov 4, 2022

mirek commented Nov 4, 2022

bwhitty commented Nov 4, 2022

Qard commented Nov 4, 2022

GeoffreyBooth commented Nov 4, 2022

Flarna commented Nov 5, 2022

GeoffreyBooth commented Nov 5, 2022

Qard commented Nov 5, 2022

GeoffreyBooth commented Nov 6, 2022

bwhitty commented Nov 11, 2022

Qard commented Nov 11, 2022

[async hooks] Criteria for exiting experimental #194

[async hooks] Criteria for exiting experimental #194

Comments

ofrobots commented May 16, 2018

mike-kaufman commented May 16, 2018

bmeurer commented May 16, 2018

mcollina commented May 16, 2018

mrkmarron commented May 17, 2018

Flarna commented Nov 20, 2018

github-actions bot commented Jul 17, 2020

mmarchini commented Jul 20, 2020

boutell commented Mar 2, 2022

Qard commented Mar 2, 2022

mcollina commented Mar 2, 2022

Qard commented Mar 2, 2022

boutell commented Mar 2, 2022

mcollina commented Mar 2, 2022

Qard commented Mar 3, 2022

mcollina commented Mar 3, 2022

Flarna commented Mar 3, 2022

vmarchaud commented Mar 3, 2022

mcollina commented Mar 3, 2022

github-actions bot commented Aug 4, 2022

mirek commented Nov 4, 2022

Qard commented Nov 4, 2022

mirek commented Nov 4, 2022

bwhitty commented Nov 4, 2022

Qard commented Nov 4, 2022

GeoffreyBooth commented Nov 4, 2022

Flarna commented Nov 5, 2022

GeoffreyBooth commented Nov 5, 2022

Qard commented Nov 5, 2022

GeoffreyBooth commented Nov 6, 2022

bwhitty commented Nov 11, 2022

Qard commented Nov 11, 2022