Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should EventEmitter extend AsyncResource? #34430

Closed
ronag opened this issue Jul 19, 2020 · 40 comments
Closed

Should EventEmitter extend AsyncResource? #34430

ronag opened this issue Jul 19, 2020 · 40 comments
Labels
async_hooks Issues and PRs related to the async hooks subsystem. events Issues and PRs related to the events subsystem / EventEmitter. question Issues that look for answers.

Comments

@ronag
Copy link
Member

ronag commented Jul 19, 2020

Would simplify some things and make it more ergonomic to ensure async context propagation in various API's. Though probably with some performance impact?

@ronag ronag added the question Issues that look for answers. label Jul 19, 2020
@devsnek
Copy link
Member

devsnek commented Jul 19, 2020

I'm not an expert, but events are dispatched synchronously. So if you emit an event in some async code, the context should already be correct?

@ronag
Copy link
Member Author

ronag commented Jul 19, 2020

Not at all. Depends on what the root of the event is. e.g. most stream events would need a runinasynccontext when backed by e.g. a socket.

Do fs methods properly propagate async context?

@addaleax
Copy link
Member

I think this generally makes sense, for the same reason that the domain module, in addition to tracking async context, also monkey-patched the EventEmitter class to provide for this.

Also, just for context, there is some previous discussion in #33723.

@devsnek
Copy link
Member

devsnek commented Jul 19, 2020

oh is this about propagating the context that was active when the emitter instance was created?

@addaleax
Copy link
Member

That’s a valid question. Either we propagate the context of when the emitter was created, or the context of when the listener was registered.

My understanding of how async_hooks works, and my understanding of what makes sense here for e.g. ALS, would be to use the async id of the listener registration, but the async trigger id of the emitter creation. In particular, for this:

http.createServer((req, res) => {
  asyncLocalStorage.run(store, () => {
    req.on('end', () => {
      const store = asyncLocalStorage.getStore();  
    });
  });
}).listen(8080);

I think it would be a common expectation that the inner store variable matches the outer one.

@ronag
Copy link
Member Author

ronag commented Jul 19, 2020

async id of the listener registration, but the async trigger id of the emitter creation

This has always confused me. What is the practical difference of these two? i.e. how would I consume them differently?

@addaleax
Copy link
Member

@ronag My understanding is that the async id propagates the when of an async call, and the trigger async id propagates the why. I think we’re definitely lacking a clear definition that can be applied programatically here, though.

@kjarmicki
Copy link

@addaleax Could enterWith(store) be an answer here? Modifying your example:

http.createServer((req, res) => {
  asyncLocalStorage.enterWith(store);
  req.on('end', () => {
    const store = asyncLocalStorage.getStore();  
  });
}).listen(8080);

On v12.18.2 the store is still available in the callback.
As far as I understand enterWith, the store should not leak to other requests, so would there be any reason not to use it like that?

@addaleax
Copy link
Member

@kjarmicki That might work in your case, but it probably only does so by accident – in order to actually consume the content of req, you’ll need to read from req or discard its contents using req.resume();, and at least as far as I can tell the deciding factor is whether the resume call happens inside the ALS scope or not. Event listeners don’t interact with ALS at all so far by themselves.

@kjarmicki
Copy link

@addaleax oh, ok 🙂

So the right way would be to wrap the end event with a Promise, correct?

http.createServer((req, res) => {
  asyncLocalStorage.run(store, () => {
    new Promise(resolve => req.on('end', resolve))
      .then(() => {
        const store = asyncLocalStorage.getStore();
      });
  });
}).listen(8080);

@addaleax
Copy link
Member

@kjarmicki I’m not sure what you mean by “the right way”, but that should work (and using await events.once(req, 'end') would work as well). But this discussion here is specifically about whether EventEmitter should also propagate async context on its own.

@kjarmicki
Copy link

@addaleax by "the right way" I meant that it wouldn't be an accident that it works.
Thank you for the replies, and I'm not derailing the discussion anymore 😉

@ronag
Copy link
Member Author

ronag commented Jul 21, 2020

@ronag My understanding is that the async id propagates the when of an async call, and the trigger async id propagates the why. I think we’re definitely lacking a clear definition that can be applied programatically here, though.

That answers part of my question. Though I'm still unsure when & how I would practically use e.g. trigger id. I think some motivating examples in the doc would be nice in this case.

@jasnell
Copy link
Member

jasnell commented Jul 21, 2020

Fwiw, I tried this once and performance of EventEmitter dropped tenfold. A better approach (albeit less ergonomic) would be to either emit from within the context you wish to propagate, or bind the handler function to it.

@addaleax
Copy link
Member

A better approach (albeit less ergonomic) would be to either emit from within the context you wish to propagate, or bind the handler function to it.

The problem with this is that we basically end up recommending that everybody adds async tracking manually, which will:

  • Not add the perf overhead to the EE implementation itself, so our benchmarks look nice, but in reality the overhead is still there, we just don’t measure it
  • Lead to a lot of modules in the wild which will never be updated to account for this

@jasnell
Copy link
Member

jasnell commented Jul 21, 2020

It's a fair point, but really, not everyone is going to need the async tracking and shouldn't need to incur the cost. Perhaps an additional option would be feasible here? Similar to captureRejections, something that ensures that when emit() is called the context propagates -- or a separate emitWithContext() type of option that does the same? Again, not as ergonomic but avoids the performance penalty when it's just not needed.

@ronag
Copy link
Member Author

ronag commented Jul 21, 2020

It's a fair point, but really, not everyone is going to need the async tracking and shouldn't need to incur the cost.

Maybe an opt-out option? i.e. default to the most "correct" option but allow to opt-out when performance is critical and the user explicitly accepts the "risks".

@addaleax addaleax added async_hooks Issues and PRs related to the async hooks subsystem. events Issues and PRs related to the events subsystem / EventEmitter. labels Jul 21, 2020
@jasnell
Copy link
Member

jasnell commented Jul 21, 2020

Maybe an opt-out option? i.e. default to the most "correct" option but allow to opt-out when performance is critical and the user explicitly accepts the "risks".

Let's implement and benchmark. If the performance hit is too significant, then let's make it opt-in for 14 and 15 with the plan to make it opt-out for 16. This gives folks a reasonable transition period.

@mcollina
Copy link
Member

I don't think this is feasible, but it's probably worthwhile to try anyway. Firstly, there are a lot of events that are sync in nature, don't need to forward the async context for which this would be pure overhead: an example of this are all the 'data' event during a .pipe(). Secondly, this might be breaking anybody actually relying on EE being a small, efficient utility.

I think one of the critical failures in domains was monkey-patching EE to work. I think this is likely to lead with similar problems.

@addaleax
Copy link
Member

I think one of the critical failures in domains was monkey-patching EE to work. I think this is likely to lead with similar problems.

@mcollina I think it might be helpful if you could explain why you consider this a problem? Is the problem the monkey-patching, or the integration of domains and EEs at all?

@ronag
Copy link
Member Author

ronag commented Jul 24, 2020

there are a lot of events that are sync in nature, don't need to forward the async context for which this would be pure overhead

This does bring an interesting scenario though. Should we have flag for emit or alternative method with a flag where a performance sensitive implementation can indicate whether the active async context is already the correct one?

@mcollina
Copy link
Member

@mcollina I think it might be helpful if you could explain why you consider this a problem? Is the problem the monkey-patching, or the integration of domains and EEs at all?

I think it will be extremely hard to think of all possible consequences if this new behavior would be added by default to all EE.

@Flarna
Copy link
Member

Flarna commented Jul 27, 2020

That answers part of my question. Though I'm still unsure when & how I would practically use e.g. trigger id. I think some motivating examples in the doc would be nice in this case.

I think the triggerId is of help if you are interested in the resource tree whereas executionId is of help to follow the code execution. All APMs I know (and AsyncLocalStorage) follow the executionId path. I haven't seen something like the triggerId path in other awaiting languages like .NET.
e.g. consider an incoming HTTP request.
==> An APM tool sees this as new transaction which is a root of a tree which is what you get if you follow the executionId path
==> Some resource management tool (?) would like to link it to the HTTP server which is what you get if you follow the triggerId path

@Flarna
Copy link
Member

Flarna commented Jul 27, 2020

I don't think that the caller of emit can ensure that the correct context is already active. Consider some EventEmitter used within a connection pool. There might be several listeners for e.g. an error event installed from different execution path therefore there are more contexts to set.

@puzpuzpuz
Copy link
Member

Consider some EventEmitter used within a connection pool. There might be several listeners for e.g. an error event installed from different execution path therefore there are more contexts to set.

Good point. Considering this, it's more correct to bind each listener, not the whole .emit() invocation.

@ronag
Copy link
Member Author

ronag commented Aug 6, 2020

So current consensus is that one should not expect context to propagate over events? That does make things easier but is also (at least for me) a fundamental change of practice/expectation.

@jasnell
Copy link
Member

jasnell commented Aug 6, 2020

Yes, that's the consensus thus far. The recently landed AsyncResource.bind() should help significantly here by just making this easier...

const resource = new AsyncResource('foo')
const ee = new EventEmitter()

ee.on('foo', resource.bind(() => { /** Invoked within the right context **/ }))

@Flarna
Copy link
Member

Flarna commented Aug 6, 2020

I think the main problem is that emitters are that generic that it's hard to come up with a correct soluation.

An HTTP server is an event emitter but all APM tools I know interpret the request event as new transaction and not linked to the context of HTTP server creation. I would assume that there are server for other protocols out in the wild which act similar.

@jasnell
Copy link
Member

jasnell commented Aug 6, 2020

Well, with AsyncResource.bind() now available, one approach we could take is to specify an AsyncResource as an option when the EventEmitter is created, then just automatically bind to it any time an event listener is added.

const asyncResource = new AsyncResource('foo')
const ee = new EventEmitter({ asyncResource })

ee.on('bar', () => { /** implicitly bound to asyncResource **/ });

This would incur a small performance penalty when the event handler is added in order to check for the presence of the asyncResource option, but it would have zero performance penalty on the typical default emit() when no asyncResource is provided.

@ronag
Copy link
Member Author

ronag commented Aug 6, 2020

This would essentially move async context propagation responsibility from the implementer to the user. Is this the practice we go with in general, or just with event emitters?

@jasnell
Copy link
Member

jasnell commented Aug 6, 2020

I think we have to take it case by case as I'm not sure there's a general rule that would work.

@ronag
Copy link
Member Author

ronag commented Aug 6, 2020

I think we have to take it case by case as I'm not sure there's a general rule that would work.

Can we at least try to outline some kind of rules to refer to? Right now I find it personally very confusing what is expected from me as a library developer and also what I can expect as a user.

We need some form of guideline here...

e.g. IMO it's better to say to users that we don't propagate across events, rather than say that we sometimes propagate across events.

@addaleax
Copy link
Member

addaleax commented Aug 6, 2020

To be blunt, I don’t think we can get any kind of reasonable async tracking for event emitters that isn’t implemented by the built-in EventEmitter class itself, there’s just too much code written out there that doesn’t expect this. AsyncResource.bind or a possible asyncResource option to the EventEmitter constructor (which probably wouldn’t even give us the right thing!) are nice but just not practical as actual solutions in the bigger picture here.

And I know it’s not great to go and decrease EE performance, but if we end up recommending everybody to do the changes on the consumer side, the only difference is the massive amount of work that needs to be done in the ecosystem.

@jasnell
Copy link
Member

jasnell commented Aug 6, 2020

@addaleax ... it's questionable even if it were built-in to EventEmitter due simply to backwards compatibility issues (note that the eventemitter3 module on npm has 20 million weekly downloads and would need to be kept in sync). There would, at the very least, need to be a way of gracefully detecting when the async tracking is not supported so that it could be polyfilled, and any module that supports multiple down-level versions of Node.js would still need to handle this in order to propagate context appropriately. If nothing else, I think we'd have to handle this in much the same way as captureRejections ...initially make it an opt in configuration option that we see later if we can always turn on.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Aug 7, 2020

Considering what's written above, we could create an AsyncResource implicitly when ee.on() is called. This also assumes that the event listener function will be bound to the outer context. By making this change, we will guarantee that EE is integrated with async_hooks in an intuitive way (similar to what we have with all other async resources).

But the main concern here is a certain performance penalty, as this approach will mean some litter and AsyncResource initialization introduced for every .on() invocation. To mitigate this penalty, we could enable this behavior only when there are active hooks in place. Or maybe someone has a better idea?

@bvallee-thefork
Copy link

We needed context propagation on EventEmitters, and we couldn't easily bind all the eventEmitter.on() calls.

In our case, we ended up overriding the .emit method of those event emitters (namely, the event emitters of a request and of a websocket) to get a behavior similar to the one from Node.js domains:

  asyncLocalStorage.run({}, () => {
    req.emit = AsyncResource.bind(req.emit.bind(req));
  
    /* ... */
  });

@bnoordhuis
Copy link
Member

@ronag since you're the OP and this issue has been open for 2.5 years with no movement: do you foresee this happening?

@ronag
Copy link
Member Author

ronag commented Mar 15, 2023

I have no idea. I'm no longer on top of this. Feel free to close.

@bnoordhuis
Copy link
Member

Okay, I'll go ahead and close it. Holler if it should be reopened.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
@Flarna
Copy link
Member

Flarna commented Mar 15, 2023

fwiw a while ago EventEmitterAsyncResource was added via #41246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. events Issues and PRs related to the events subsystem / EventEmitter. question Issues that look for answers.
Projects
None yet
Development

No branches or pull requests

10 participants