Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow Node.js to manage microtasks queue #28957

Merged
merged 2 commits into from May 3, 2021

Conversation

indutny
Copy link
Contributor

@indutny indutny commented May 1, 2021

Description of Change

When uv_run() resulted in invocation of JS functions the microtask
queue checkpoint in Node's CallbackScope was a no-op because the
expected microtask queue policy was kExplicit and Electron ran under
kScoped policy. This change switches policy to kExplicit right
before uv_run() and reverts it back to original value after uv_run()
completes to provide better compatibility with Node.

cc @codebytere since you touched this code.

Checklist

Release Notes

Notes: Allow Node.js to manage microtasks queue by using explicit microtasks policy before calling uv_run()

@indutny indutny requested a review from a team as a code owner May 1, 2021 21:32
@electron-cation electron-cation bot added the new-pr 🌱 PR opened in the last 24 hours label May 1, 2021
@indutny
Copy link
Contributor Author

indutny commented May 1, 2021

Here's the "test case" (benchmark really): https://github.com/indutny/electron-napi-churn-demo

Build instructions:

npm install

Test instructions (Node.js):

node app/preload.js

Test instructions (electron):

./node_modules/.bin/electron app/main.js

Numbers:

# Node.js
190.523221ms
# Electron (unpatched)
369.185357ms
# Electron (patched)
194.212882ms

@indutny
Copy link
Contributor Author

indutny commented May 1, 2021

The downside of this approach is that malformed code like this:

function hang() {
  setImmediate(hang);
}
hang();

causes starvation of the main thread (while UV loop continues to run fine).

@codebytere codebytere added the semver/patch backwards-compatible bug fixes label May 2, 2021
@codebytere
Copy link
Member

@indutny is this something you plan to upstream? Also - could you please add some release notes?

@deepak1556 deepak1556 requested a review from zcbenz May 2, 2021 15:08
@deepak1556
Copy link
Member

deepak1556 commented May 2, 2021

Can we check if env->immediate_idle_handle() is not next in queue of uv_loop_->idle_handles before running the loop to avoid the starvation issue ? I think it might also help with addressing the failing tests.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@indutny is this something you plan to upstream?

I'm not sure if it makes much sense for Node.js, since there is no extra cost for waiting for a next uv loop iteration there. @addaleax could you take a look a the patch? I'd appreciate to hear your opinion on this issue.

Also - could you please add some release notes?

Done, sorry for not writing these immediately!

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

Can we check if env->immediate_idle_handle() is not next in queue of uv_loop_->idle_handles before running the loop to avoid the starvation issue ? I think it might also help with addressing the failing tests.

This is a great idea! Let me come up with a patch real quick!

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

btw, as you can imagine I've ran into this while investigating performance issues in a real app. In the case of that app napi_threadsafe_function was called through neon since there it is easier to keep it around than to keep the FunctionContext (rust wrapper around napi_env essentially).

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@deepak1556 @codebytere all resolved except for submitting it as a PR to Node.js (see above).

Copy link
Contributor

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you want to upstream this, it might be nicer to move the if (size == 0) { ... } block in this function to after the if (popped_value) { ... } block (and thus avoiding the call to uv_idle_stop() if it’s not necessary), instead of restarting the idle handle even if there are not actually any new items in the queue?


// Make sure that `setImmediate` yields to browser to avoid starvation
auto* immediate_idle_handle =
reinterpret_cast<uv_handle_t*>(env->immediate_idle_handle());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a public API that Node.js could reasonable introduce at some point here (without exposing the idle handle itself, which is really more of an implementation detail)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea to me. This PR just wants to know if setImmediate is scheduled for run in order to interleave uv_run() with browser's event loop and avoid starvation.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

I think if you want to upstream this, it might be nicer to move the if (size == 0) { ... } block in this function to after the if (popped_value) { ... } block (and thus avoiding the call to uv_idle_stop() if it’s not necessary), instead of restarting the idle handle even if there are not actually any new items in the queue?

(Argh, wanted to move my comment to top-level and copy-paste failed me)

I've change the patch to make it avoid calling uv_idle_stop when we popped the value (i.e. the queue is active), but for this optimization to have an effect - the extra iteration is still required. For our app, the Push() was called not only from the threadsafe function, but also from the microtasks queue (promises...). I don't believe that this optimization is necessary for Node.js, because the cost of extra full uv tick is substantially lower than in Electron.

With this in mind, would you say I should submit it upstream (Node.js)?

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

@indutny Right, but … my point is more like, after the if (popped_value) { ... } the JS callback and its microtask queue have run already, so you would know for sure whether there are new entries in the queue?

I don't believe that this optimization is necessary for Node.js, because the cost of extra full uv tick is substantially lower than in Electron.

Yeah, I agree. Maybe it would also make sense to do something here that matches what we do for e.g. incoming messages on MessagePorts, i.e. loop over the queue synchronously up to a specific iteration number limit, and only return control back to the event loop and defer the other iterations if that limit has been exceeded? (Because the only reason I could imagine DispatchOne() only dispatching a single item is to avoid event loop starvation.)

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@indutny Right, but … my point is more like, after the if (popped_value) { ... } the JS callback and its microtask queue have run already, so you would know for sure whether there are new entries in the queue?

The trouble is that microtasks queue is emptied after uv_run() completes so DispatchOne() is leaved and can't check the state anymore. I'm not sure if there are any implications of running the microtasks queue in DispatchOne(), but this sounds a bit heavy-handed to me. What do you think?

I don't believe that this optimization is necessary for Node.js, because the cost of extra full uv tick is substantially lower than in Electron.

Yeah, I agree. Maybe it would also make sense to do something here that matches what we do for e.g. incoming messages on MessagePorts, i.e. loop over the queue synchronously up to a specific iteration number limit, and only return control back to the event loop and defer the other iterations if that limit has been exceeded? (Because the only reason I could imagine DispatchOne() only dispatching a single item is to avoid event loop starvation.)

I see. As an extra sanity check, I think this could be a great idea. Maybe we don't even need that idle handle check if there is a limit to the execution count anyway (cc @deepak1556 @codebytere )? Correct me if I'm wrong @addaleax, but Node.js uses 1000 as a limit for same tick message processing and maybe Electron should do here the same?

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

@indutny Right, but … my point is more like, after the if (popped_value) { ... } the JS callback and its microtask queue have run already, so you would know for sure whether there are new entries in the queue?

The trouble is that microtasks queue is emptied after uv_run() completes so DispatchOne() is leaved and can't check the state anymore. I'm not sure if there are any implications of running the microtasks queue in DispatchOne(), but this sounds a bit heavy-handed to me. What do you think?

DispatchOne() contains a CallbackScope around the relevant JS function call, which results in the microtask queue being run, so that’s already happening here (unless Electron suppresses that somehow? In which case I think that’s the real bug here). But yeah, the microtask queue should always already be empty when uv_run() returns.

Correct me if I'm wrong @addaleax, but Node.js uses 1000 as a limit for same tick message processing and maybe Electron should do here the same?

Yes, or the patch to Node.js could also do the same thing here already if that makes sense (in which case upstreaming it also seems like a good idea).

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

I actually had a patch in works that did that and many other things 😂, but then microtasks queue killed whole idea. Let me give it few more tryouts to see why CallbackScope doesn't empty it. A totally uneducated question - but could microtasks queue in CallbackScope not be run because it is under this condition?

  if (!tick_info->has_tick_scheduled()) {
    env_->context()->GetMicrotaskQueue()->PerformCheckpoint(env_->isolate());

    perform_stopping_check();
  }

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

A totally uneducated question - but could microtasks queue in CallbackScope not be run because it is under this condition?

Well… if has_tick_scheduled() is true, it should call the next tick queue, which then runs microtasks at its end, so I think this should be fine even in Electron?

If the CallbackScope doesn’t empty it, it might be because there is one on an outer scope (that contains uv_run()), which would break a bunch of assumptions that CallbackScope makes.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@addaleax could has_tick_scheduled be false for me?

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

@indutny That really depends on the JS callback here :) But, sure, if the callback doesn’t schedule any process.nextTick() calls, then it should be false.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@addaleax the callback is basically an empty async function() {} and the Rust code invokes .then() on its return value to schedule another threadsafe function execution.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

Here's the PR with the speed improvements for Node.js: nodejs/node#38506

I've just tested the electron build without that extra uv_idle_t iteration, and the performance improvement for our app is gone with it. Sounds like at least for our use case it is absolutely necessary.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

If the CallbackScope doesn’t empty it, it might be because there is one on an outer scope (that contains uv_run()), which would break a bunch of assumptions that CallbackScope makes.

FWIW, Electron only has gin_helper::MicrotasksScope, v8::HandleScope, and v8::Context::Scope around uv_run() call.

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

Maybe the MicrotasksScope is messing with the explicit PerformCheckpoint() call here?

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

@addaleax I don't know, sounds like the call happens only on destructor and this in turn happens only after uv_run() leaves. So if the threadsafe function was queued from there - it wasn't called from uv_run().

I can try adding some debug printing to the build to see why it is such. Let me see.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

Oh gosh, I see now that the condition is not has_tick_scheduled. My bad! Yeah, it is funky.

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

Alright, now I have a better understanding of what's going on in Electron thanks to invaluable help of @addaleax . Electron uses scoped microtask policy (with some floating v8 patches so that node's explicit policy use doesn't crash it), and so the microtasks queue is not emptied until we left uv_run(). This explains why I was seeing napi_threadsafe_function calls happening after uv_run()!

Seems like there are two ways to go forward with this:

  • Patch Node to use the MicrotasksScope in the CallbackScope instead of manual checkpoint
  • Do an extra idle tick in the threadsafe function

My impression is that the preference (correct me if I'm wrong @addaleax ) should go for the first one. In this case, it looks like no changes to the electron's code base would be needed other than pulling up the patches from Node.js upstream!

@addaleax
Copy link
Contributor

addaleax commented May 2, 2021

  • Patch Node to use the MicrotasksScope in the CallbackScope instead of manual checkpoint

Yeah, I’m sorry, I thought that that would help, but ultimately, neither leaving a MicrotasksScope nor calling MicrotaskQueue::PerformCheckpoint() have any effect when inside another MicrotasksScope.

I’m wondering why Electron has the outer MicrotasksScope in the first place – can we avoid that somehow?

@indutny
Copy link
Contributor Author

indutny commented May 2, 2021

I'm going to try out changing the microtasks policy right before entering uv_run() and then changing it back to what it was before that. Wish me luck!

@deepak1556
Copy link
Member

/trop run backport

@trop
Copy link
Contributor

trop bot commented May 3, 2021

The backport process for this PR has been manually initiated - here we go! :D

@indutny-signal
Copy link
Contributor

Thanks!

@trop
Copy link
Contributor

trop bot commented May 3, 2021

I have automatically backported this PR to "12-x-y", please check out #28972

@trop
Copy link
Contributor

trop bot commented May 3, 2021

I have automatically backported this PR to "13-x-y", please check out #28973

@trop
Copy link
Contributor

trop bot commented May 3, 2021

I have automatically backported this PR to "11-x-y", please check out #28974

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch backwards-compatible bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants