Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker: add eventLoopUtilization() #35664

Closed
wants to merge 2 commits into from

Conversation

trevnorris
Copy link
Contributor

NOTE: While this is working as it should, I feel the implementation may be a bit clunky. Specifically the one TODO about thread-safe access to a performance mark. Looking for feedback on how this could be improved.


worker: add eventLoopUtilization()

Allow calling eventLoopUtilization() directly on a worker thread:

const worker = new Worker('./foo.js');
const elu = worker.performance.eventLoopUtilization();
setTimeout(() => {
  worker.performance.eventLoopUtilization(elu);
}, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

Also include a commit that improves tests for perf_hooks.performance.eventLoopUtilization().


Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Oct 15, 2020
Copy link
Member

@juanarbol juanarbol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return args.GetReturnValue().Set(-1);

// TODO(trevnorris): This isn't technically thread safe.
double loop_start_time = w->env_->performance_state()->milestones[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key question here then would be if the loop start milestone is guaranteed to be set before user code is permitted to call LoopStartTime. I suspect that it's not. Curious what @addaleax thinks here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked and it's not guaranteed. The reason this (probably) works is because it's only doing a read, and the value is only written to once. So if the state hasn't synchronized it'll return {active: 0, idle: 0, utilization: 0}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this (probably) works is because it's only doing a read, and the value is only written to once. So if the state hasn't synchronized it'll return {active: 0, idle: 0, utilization: 0}.

It’s not necessarily that easy – we should not assume that 64-bit reads & writes are atomic on every platform, I think.

The easiest way to work around this would probably be to only allow calling this function if the 'online' event has been received on the Worker instance, and if not, skipping this call into C++. We could then assert here that loop_start_time > 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax 64-bit reads/writes are always atomic on 64-bit systems if the memory is aligned. So, since we support 32-bit Windows builds, I guess this isn't solved so easily. :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris That’s true for x64, but we’re also supporting arm and ppc :)

@jasnell
Copy link
Member

jasnell commented Oct 16, 2020

Looks good but let's work through the possible threadsafety issue on the performance mark before sign off

@juanarbol
Copy link
Member

I'm agree with @jasnell, sorry about the rush, haha

@trevnorris trevnorris force-pushed the perf-on-worker branch 5 times, most recently from a61c8d4 to 10b0775 Compare October 16, 2020 05:27
src/node_worker.cc Outdated Show resolved Hide resolved
lib/internal/worker.js Show resolved Hide resolved
return args.GetReturnValue().Set(-1);

// TODO(trevnorris): This isn't technically thread safe.
double loop_start_time = w->env_->performance_state()->milestones[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this (probably) works is because it's only doing a read, and the value is only written to once. So if the state hasn't synchronized it'll return {active: 0, idle: 0, utilization: 0}.

It’s not necessarily that easy – we should not assume that 64-bit reads & writes are atomic on every platform, I think.

The easiest way to work around this would probably be to only allow calling this function if the 'online' event has been received on the Worker instance, and if not, skipping this call into C++. We could then assert here that loop_start_time > 0.

@trevnorris
Copy link
Contributor Author

trevnorris commented Oct 16, 2020

@addaleax Thanks for the review. I've made fixes for everything, including only allowing to call LoopStartTime after the 'online' event.

edit: adding the is online check doesn't allow the benchmark to run. working on this.

@trevnorris
Copy link
Contributor Author

@addaleax Fix the benchmark issue. Which made me realize that because reading the ELU of a worker can't be done until after the 'online' event, it now required the event loop to iterate at least once before it can be read.

This isn't optimal, but I can't think of a better solution for now. In the future I wanted to make reading/writing to all the milestones atomic so we could add worker.performance.nodeTiming as well. If that ever happens, we can remove the current limitation.

@addaleax
Copy link
Member

Which made me realize that because reading the ELU of a worker can't be done until after the 'online' event, it now required the event loop to iterate at least once before it can be read.

I think the current approach of returning an object with 0 values is just fine.

In the future I wanted to make reading/writing to all the milestones atomic so we could add worker.performance.nodeTiming as well.

Keep in mind that even if reading a single value might be atomic, reading the entire group of timing values would not be.

@addaleax addaleax added semver-minor PRs that contain new features and should be released in the next minor version. worker Issues and PRs related to Worker support. request-ci Add this label to start a Jenkins CI on a PR. labels Oct 17, 2020
@addaleax
Copy link
Member

addaleax commented Oct 17, 2020

@nodejs/workers

@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 17, 2020
@nodejs-github-bot
Copy link
Collaborator

values for the worker instance.

Currently the event loop utilization of a worker instance can only be retrieved
after the [`'online'` event][] event has fired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add note about what happens if you call it before online has fired?

Copy link
Member

@Trott Trott Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
after the [`'online'` event][] event has fired.
after the [`'online'` event][] is emitted.

@@ -223,6 +227,12 @@ class Worker extends EventEmitter {
null,
hasStdin: !!options.stdin
}, transferList);
// Use this to cache the Worker's loopStart value once available.
this[kLoopStartTime] = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the -1 in a const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr Sorry, I don't follow what you mean.

@@ -415,6 +426,50 @@ function makeResourceLimits(float64arr) {
};
}

function eventLoopUtilization(util1, util2) {
// TODO(trevnorris): Works to solve the thread-safe read/write issue of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for the API instead to be emitted on online so:

  • The worker emits online with an "stats" argument
  • The stats argument has this eventLoopUtilization method the user can call

That would make it impossible to use this API incorrectly and remove a bit of the weirdness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it makes accessing the API more difficult, with little benefit…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax
Well, the benefit is that it would be impossible to misuse the API. Currently this API has a (relatively small) footgun if you call eventLoopUtilization before online is fired. That eliminates the footgun.

Note that the API isn't really that bad:

const worker = new Worker(...);
const controller = await once(worker, 'online');
console.log(controller.eventLoopUtilization()); // same thing as currently on worker now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr I think my disagreement is more about whether this is a footgun or not :) If you call it before the 'online' event, you don’t get inaccurate data, you get no ELU data – which is accurate, because (up to race conditions) there actually is no ELU data available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the distinction between "no data" and "0 data" here isn't important - I will concede.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr I plan on figuring out a way to remove the limitation of not being able to query the ELU until the 'online' event has been emitted.

// has had a chance to turn. So it will be impossible to read the ELU of
// a worker thread immediately after it's been created.
if (!this[kIsOnline]) {
return { idle: 0, active: 0, utilization: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this API, I really prefer it if we didn't return 0s to indicate "not yet online" here and instead throw an error. I can see users running into footguns here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently agrees with the behavior on the main thread. I think throwing an error would be the (way) more unexpected result.

}

// Cache loopStart, since it's only written to once.
if (this[kLoopStartTime] <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== -1 (rather than <=0) ? (also, const?)

@benjamingr
Copy link
Member

Also, I did not actually check out the branch and play with it so I didn't LGTM as this already has enough approvals - but if that would help/is needed here I am happy to do that.

To be clear (eventhough thankfully our process doesn't require this anymore): none of my comments are blocking and I am +1 on these changes in general.

juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#35664
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37163
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 10, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 10, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
richardlau pushed a commit that referenced this pull request Mar 2, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: #35664
Backport-PR-URL: #37163
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
richardlau pushed a commit that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: #37163
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 8, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 8, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
richardlau pushed a commit to juanarbol/node that referenced this pull request Mar 16, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: nodejs#35664
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>

Backport-PR-URL: nodejs#37165
richardlau pushed a commit to juanarbol/node that referenced this pull request Mar 16, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
@richardlau richardlau mentioned this pull request Mar 18, 2021
MylesBorins pushed a commit that referenced this pull request Apr 6, 2021
Allow calling eventLoopUtilization() directly on a worker thread:

    const worker = new Worker('./foo.js');
    const elu = worker.performance.eventLoopUtilization();
    setTimeout(() => {
      worker.performance.eventLoopUtilization(elu);
    }, 10);

Add a new performance object on the Worker instance that will hopefully
one day hold all the other performance metrics, such as nodeTiming.

Include benchmarks and tests.

PR-URL: #35664
Backport-PR-URL: #37163
Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this pull request Apr 6, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: #37163
@danielleadams danielleadams mentioned this pull request May 3, 2021
@targos targos removed the commit-queue-failed An error occurred while landing this pull request using GitHub Actions. label Sep 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. semver-minor PRs that contain new features and should be released in the next minor version. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants