Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on EventTarget maxListeners > THRESHOLD #35990

Closed
benjamingr opened this issue Nov 6, 2020 · 18 comments
Closed

Warn on EventTarget maxListeners > THRESHOLD #35990

benjamingr opened this issue Nov 6, 2020 · 18 comments
Labels
events Issues and PRs related to the events subsystem / EventEmitter.

Comments

@benjamingr
Copy link
Member

tl;dr;

  • I want to add the "max listener warning" to all event targets (and not just NodeEventTarget)
  • I checked with WHATWG and we are allowed to do this.
  • This is because it's very tricky to wire cancellation correctly when attaching "abort" signal handlers.

cc @jasnell and @addaleax whom are probably the most familiar with this code/question.

Also cc @mcollina since you often have smart things to say in issues related to memory leaks :]


Hey,

@bterlson who helps maintain the Azure SDK has been using AbortController and AbortSignal extensively and has raised a few rather severe issues with AbortControllers/Signals.

First - thanks a ton for the valuable feedback and help Brian!

The biggest one IMO pertains to a memory leak he ran into repeatedly where:

  • You have one AbortSignal/AbortController bound to the process (to deal with SIGINT for example) that propagates
  • Actions downstream have their own AbortController/AbortSignals and there exists code that links them up, for example: processLevelSignal.addEventListener('abort', e => childOpController.abort())
  • Since the handlers are never cleaned up - memory leaks, it basically requires library authors to write tricky cancellation aware code. This is so tricky that when we implemented APIs that take signals we didn't always get it right. For example our timers/promises timeout does not clean up its signal on successful completion (added by @jasnell, whom added AbortController and is probably the most competent with this API in core and is a strong dev in general).

We already have NodeEventTarget that checks for this bug and maxListeners like EventEmitter.

I asked WHATWG if we are allowed to emit a warning based on maxListeners to the console like we do for EventEmitters.

Note - this is a problem even if the handlers are attached with { once: true } - like the above case where the issue is when the timer completes successfully.

@benjamingr benjamingr added the events Issues and PRs related to the events subsystem / EventEmitter. label Nov 6, 2020
@mcollina
Copy link
Member

mcollina commented Nov 6, 2020

For example our timers/promises timeout does not clean up its signal on successful completion (added by @jasnell, whom added AbortController and is probably the most competent with this API in core and is a strong dev in general)

I guess this needs to be fixed.

@benjamingr
Copy link
Member Author

@mcollina yeah, I will push a fix for that (it's a relatively small fix). My concern was more with the fact if someone so familiar with the API can make this error - anyone can - and userland will run into this - and I want us to help them.

@mcollina
Copy link
Member

mcollina commented Nov 6, 2020

I want to add the "max listener warning" to all event targets (and not just NodeEventTarget)

I agree on the topic, we should match the behavior we have in EventEmitter.

@benjamingr
Copy link
Member Author

benjamingr commented Nov 6, 2020

One possible issue is that we can't add properties to EventTarget so it would be impossible (or at least hard, we can expose a static method I guess) to configure this threshold.

One alternative we can do is to override this on AbortSignals (and not every EventTarget) so that regular EventTargets don't get the warning but AbortSignals do.

@mcollina
Copy link
Member

mcollina commented Nov 6, 2020

One possible issue is that we can't add properties to EventTarget so it would be impossible (or at least hard, we can expose a static method I guess) to configure this threshold.

Why?

@jasnell
Copy link
Member

jasnell commented Nov 6, 2020

I agree that we need the warning. It was there originally before I split out NodeEventTarget but was removed from EventTarget due to the insistence that it's behavior match the browser implementation exactly or as absolutely as possible.

One possible issue is that we can't add properties to EventTarget so it would be impossible (or at least hard, we can expose a static method I guess) to configure this threshold.

Let's make it a non-enumerable, static function with a Symbol name that is not likely to collide with anything from the standard. EventTarget[events.maxListenersSymbol] = 20

@jasnell
Copy link
Member

jasnell commented Nov 6, 2020

PR to fix the timer promises coming soon.

I'll have a PR that adds the warning to EventTarget soon after

@benjamingr
Copy link
Member Author

benjamingr commented Nov 6, 2020

@jasnell already opened a PR for timers/promises at #35992

Go ahead with the warning on EventTarget :] (and feel free to ping me on eventtarget stuff I am always happy to review)

@benjamingr
Copy link
Member Author

@jasnell is there somewhere where it would be easier to reach you to coordinate this sort of effort? (like, do you use IRC? Facebook? something else?)

You've been doing a large chunk of the work here and I don't want to do things that would cause you more and not less work.

@jasnell
Copy link
Member

jasnell commented Nov 6, 2020

It's usually pretty easy to reach me on Twitter DM. I also use Signal for messaging. If that works for you email me privately and I'll give you the phone number to connect there.

@benjamingr
Copy link
Member Author

I'll revive my (unused) twitter account I guess 😅

@bterlson
Copy link

bterlson commented Nov 6, 2020

@benjamingr thanks much for filing this issue! I just want to clarify the scenarios under which memory leaks are likely to occur:

1. Forgetting to remove listeners

In many real world examples I've seen, signals are long-lived and reused multiple times. This is often the case even without setting up a parent/child relationship. People just find it convenient to reuse signals, such as when you have a cancel button on a loop that triggers async operations. So if you're not careful to remove handlers, a single signal can accumulate thousands and thousands of listeners.

However, there's another problem that causes handlers and signals to never be collected...

2. Linking signals

As far as I can tell, it is impossible to link signals as @benjamingr sketched in the OP without leaking memory. In order to propagate the cancellation signal downward, the parents hold a strong reference to the children in their abort handler. Children need the ability to remove the parent's abort handler, but this API offers no way to do that as far as I can tell.

This means that the common cancellation pattern of having a single top-level signal handling e.g. SIGINT with child signals passed into individual operations to handle timeouts or other more granular events will create signals that can never be collected. This is by itself a memory leak, but is also made much worse by the first issue because now you're leaking every handler as well.

Linking signals without leaking memory could be made possible by offering users a hookable way to dispose an abort handler such as a subscription. Intercepting calls to removeEventListener and using the API proposed in #35991 to check whether any listeners remain, and if not, removing the parent -> child linkage, is an approach that works in theory. However, the fetch spec requires that listeners are added by reaching directly into the signal rather than using the public API, and anyway the fetch spec never cleans up its handlers, so signals passed into fetch would live forever.

It seems to me that solving this memory leak requires some kind of change to how fetch works at least, but even more ideally, an API that makes unsubscribing more easily observable so userland implementations of linked signals can be implemented reasonably.

Hope this context is useful, and let me know if I can be of any help in these efforts!

@mcollina
Copy link
Member

mcollina commented Nov 6, 2020

Regarding point 2.

Is there anything that Node could do better to support this? It seems a problem at spec level more than Node.js.
(currently we do not ship fetch, and this does not give me hopes)

@benjamingr
Copy link
Member Author

Is there anything that Node could do better to support this? It seems a problem at spec level more than Node.js.
(currently we do not ship fetch, and this does not give me hopes)

Regarding fetch: If Node ships fetch we could use real listeners for abort listeners rather than a magical internal ones like browsers/the fetch spec does. This also can/should be fixed in the fetch spec.

Regarding the general problem: libraries would need to be written in a pretty tricky way in order to do cancellation correctly because of this problem.

There probably needs to be an API that will make doing this correctly easy, we can expose one as a static method somewhere.

Basically the problem is:

const rootAbortController = new AbortController();
process.on('SIGINT', rootAbortController.abort());

app.use((req, res, next) => {
  const perRequestController = new AbortController();
  rootAbortController.addEventListener("abort", () => perRequestController.abort());
  req.once('close', () => perRequestController.abort());
  req.signal = perRequestController.signal;
});
// express or whatever
app.get('/', (req, res) => {
  getResource({ signal: req.signal }).then(data => res.json(data));
});

This server keeps adding listeners to the rootAbortController's signal without them being removed, even if it did so with once it would still leak for every successful request.

The solution would be for the parent to remove the listener when it knows the child won't cancel anymore. In the above case this would mean the middleware would have to be:

app.use((req, res, next) => {
  const perRequestController = new AbortController();
  let handler = () => perRequestController.abort();
  rootAbortController.addEventListener("abort", handler);
  res.on('end', () => rootAbortController.removeEventListener"("abort", handler));
  req.once('close', () => perRequestController.abort());
  req.signal = perRequestController.signal;
});

I think it's possible but it's a very high burden to place on users, library authors and people using the API.

A possible helper would be:

app.use((req, res, next) => {
  const perRequestController = controllerFrom(perRequestController); // shedding
  res.on('end', () => perRequestController.dispose());
  req.once('close', () => perRequestController.abort());
  req.signal = perRequestController.signal;
});

Or something similar. We could also make the holding 'weak' (that is, put the keys in a weakmap so if the child signal is no longer reachable the parent does not keep a reference to the handler).

@bterlson
Copy link

bterlson commented Nov 6, 2020

The solution would be for the parent to remove the listener when it knows the child won't cancel anymore.

The problem is more subtle than this, I believe. Parents would need to remove this listener when no child or further descendants care whether cancellation has occurred. To illustrate, the code sample above has a subtle bug:

app.use((req, res, next) => {
  const perRequestController = new AbortController();
  let handler = () => perRequestController.abort();
  rootAbortController.addEventListener("abort", handler);
  res.on('end', () => rootAbortController.removeEventListener"("abort", handler));
  req.once('close', () => perRequestController.abort());
  req.signal = perRequestController.signal;
});

What if req.signal is passed somewhere and another linked signal is created from it? The expectation of that new signal would be "this signal should fire if either the new controller calls abort, the request signal aborts, or the root signal aborts". By removing the handler upon completion of the request, you are also preventing the root abort from propagating to the signal derived from the request signal. Now that signal will only abort when its controller aborts, and not when the root aborts.

In the .net world this is analogous to the difference between dispose() on the controller and dispose() on the cancellation subscription. The code sample above would call dispose() on perRequestController. But this does not clean up any parent signal ->child signal linkage, it really only frees timers or other resources the controller itself is holding. But when cancellation subscriptions are disposed, if we see that the signal doesn't have any listeners anymore, then we can remove the handler in the parent that propagates the signal.

Book-keeping this relationship between parent and children is too complex to ask people to do, so it really needs to be wrapped in a library. I believe fetch does need changes to make this possible. Aside from that, Node could offer an AbortController subclass that allows signal linking and take advantage of internal APIs, but this would be a node-specific API that wouldn't work in browsers.

@jasnell
Copy link
Member

jasnell commented Nov 6, 2020

This is just a stray thought off the top of my head but a mechanism similar to AsyncLocalStorage might actually be a viable option here (at least in Node.js). Essentially, make it possible to set an AbortController attached to the async context such that any listeners attached in the current context are removed automatically when the context unwinds.

const ac = AsyncLocalAbortContext();
//...
ac.controller.signal.addEventListener("abort", () => {});

app.use((req, res, next) => {
  // Attached listeners are scoped to the execution context ID.
  // When the execution context stack pops, associated listeners
  // are removed.
  ac.controller.signal.addEventListener("abort", () => {});  
});

@bterlson
Copy link

bterlson commented Nov 6, 2020

Are there details on this context mechanism I can read up on? It's applicability depends on when exactly the context unwinds. If it unwinds when e.g. the associated async fn returns and all nested contexts unwind, then it seems potentially useful, otherwise I think it has the same issue as the example code I was talking about in my last comment?

@jasnell
Copy link
Member

jasnell commented Nov 7, 2020

Start here for details of the AsyncLocalStorage mechanism. https://nodejs.org/dist/latest-v15.x/docs/api/async_hooks.html#async_hooks_class_asynclocalstorage

This would be similar in the underlying model.

codebytere pushed a commit that referenced this issue Nov 22, 2020
Signed-off-by: James M Snell <jasnell@gmail.com>

PR-URL: #36001
Fixes: #35990
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit to targos/node that referenced this issue Apr 24, 2021
Signed-off-by: James M Snell <jasnell@gmail.com>

PR-URL: nodejs#36001
Fixes: nodejs#35990
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit to targos/node that referenced this issue Apr 26, 2021
Signed-off-by: James M Snell <jasnell@gmail.com>

PR-URL: nodejs#36001
Fixes: nodejs#35990
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit to targos/node that referenced this issue Apr 30, 2021
Signed-off-by: James M Snell <jasnell@gmail.com>

PR-URL: nodejs#36001
Fixes: nodejs#35990
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this issue Apr 30, 2021
Signed-off-by: James M Snell <jasnell@gmail.com>

PR-URL: #36001
Backport-PR-URL: #38386
Fixes: #35990
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
events Issues and PRs related to the events subsystem / EventEmitter.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants