Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: correct test-worker-eventlooputil #35891

Conversation

Flarna
Copy link
Member

@Flarna Flarna commented Oct 31, 2020

The active worker check compared the time from sending message till response arrived from worker with the complete time the worker was running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the time read from main thread.

Fixes: #35844
Refs: #35886
Refs: #35664

The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Oct 31, 2020
@Flarna
Copy link
Member Author

Flarna commented Oct 31, 2020

@trevnorris Would be nice if you could take a look. Maybe I misunderstood your test.

@Trott
Copy link
Member

Trott commented Oct 31, 2020

Stress test on master, no concurrency: https://ci.nodejs.org/job/node-stress-single-test/191/

Stress test with this PR, no concurrency: https://ci.nodejs.org/job/node-stress-single-test/192/

Stress test on master run with -J: https://ci.nodejs.org/job/node-stress-single-test/193/ ❌ (11 failures in 1000 runs)

Stress test with this PR and -J: https://ci.nodejs.org/job/node-stress-single-test/194/ ❌ (1 failure in 1000 runs)

@watilde watilde added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 31, 2020
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 31, 2020
@nodejs-github-bot

This comment has been minimized.

@Trott
Copy link
Member

Trott commented Oct 31, 2020

Stress test on master, no concurrency: https://ci.nodejs.org/job/node-stress-single-test/191/

Stress test with this PR, no concurrency: https://ci.nodejs.org/job/node-stress-single-test/192/

Stress test on master run with -J: https://ci.nodejs.org/job/node-stress-single-test/193/ ❌ (11 failures in 1000 runs)

Stress test with this PR and -J: https://ci.nodejs.org/job/node-stress-single-test/194/ ❌ (1 failure in 1000 runs)

Based on these results, this change greatly improves things but won't completely solve the problem. (I'll take "greatly improves" for sure!) However moving the test to sequential (along with these changes or without) will resolve the issue. (The question is whether that should resolve the issue or if the fact that it does is also indicative of a bug.)

@Flarna
Copy link
Member Author

Flarna commented Nov 1, 2020

@Trott Thanks for running the perf tests.

The fail remaining is at a different location. I will look into this further once I'm back on my work place.

@gireeshpunathil
Copy link
Member

I am also thinking whether the spin time 50 ms is too small in comparison with the message overhead (looking at the failure data in freebsd, these come to 200 - 300 ms range), and as the later could potentially depend on the system characteristics (such as cpu count, load on the system etc.) , I doubt if the value idleActive(w2) is reflective of anything at all!

@Trott

This comment has been minimized.

@Flarna Flarna added the wip Issues and PRs that are still a work in progress. label Nov 1, 2020
@Flarna
Copy link
Member Author

Flarna commented Nov 2, 2020

Fixed the issue seen in last stress test by exchanging the two calls to get elu to be compared. Seems if OS decides to schedule an other process between these two lines the test failed.

new (2000) run: https://ci.nodejs.org/job/node-stress-single-test/195/

Considering that this test heavily depends on timing it's maybe better to move it into sequential to avoid/reduce side effects of high os load/low number of CPUs.

@trevnorris
Copy link
Contributor

Hi all. Finally got done with my NodeConf Remote talk. Will prioritize looking at this and helping get it fixed (nothing worse than flakey tests failing CI). Give me a bit to review everything.

Copy link
Contributor

@trevnorris trevnorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this additional change:

--- a/test/parallel/test-worker-eventlooputil.js
+++ b/test/parallel/test-worker-eventlooputil.js
@@ -50,12 +50,13 @@ let workerELU;
   if (eventLoopUtilization().idle <= 0)
     return setTimeout(mustCall(r), 5);
 
+  mainElu = eventLoopUtilization();
+
   worker = new Worker(__filename, { argv: [ 'iamalive' ] });
   metricsCh = new MessageChannel();
   worker.postMessage({ metricsCh: metricsCh.port1 }, [ metricsCh.port1 ]);
 
   workerELU = worker.performance.eventLoopUtilization;
-  mainElu = eventLoopUtilization();
   metricsCh.port2.once('message', mustCall(checkWorkerIdle));
   metricsCh.port2.postMessage({ cmd: 'elu' });
   // Make sure it's still safe to call eventLoopUtilization() after the worker

The ELU of the main thread should be stored the moment it's ready for testing.

test/parallel/test-worker-eventlooputil.js Outdated Show resolved Hide resolved
const perfWorkerElu = workerELU();
const tmpMainElu = eventLoopUtilization(mainElu);
const eluDiff = eventLoopUtilization(perfWorkerElu, mainElu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely remove the eluDiff test. I don't remember what I was thinking when I added this, but the syntax is invalid. The diff between different ELU's is only valid for the same thread. This does a diff between worker and parent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

We could add worker.threadId into the elu result and throw on mismatch as it is a clear wrong use.

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with change suggested by @trevnorris

@Flarna Flarna added perf_hooks Issues and PRs related to the implementation of the Performance Timing API. worker Issues and PRs related to Worker support. request-ci Add this label to start a Jenkins CI on a PR. and removed wip Issues and PRs that are still a work in progress. labels Nov 3, 2020
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 3, 2020
@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Nov 3, 2020

CI: https://ci.nodejs.org/job/node-test-pull-request/34032/
stress test: https://ci.nodejs.org/job/node-stress-single-test/196/

@Flarna Flarna added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. commit-queue Add this label to land a pull request using GitHub Actions. labels Nov 3, 2020
@github-actions github-actions bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Nov 4, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2020

Landed in 58280ff...9dbde1d

@github-actions github-actions bot closed this Nov 4, 2020
nodejs-github-bot pushed a commit that referenced this pull request Nov 4, 2020
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@Flarna Flarna deleted the fix-test-worker-eventlooputil branch November 4, 2020 07:20
targos pushed a commit that referenced this pull request Nov 4, 2020
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@targos targos mentioned this pull request Nov 4, 2020
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 1, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37163
juanarbol pushed a commit to juanarbol/node that referenced this pull request Feb 10, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
richardlau pushed a commit that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: #37163
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 2, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 8, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
juanarbol pushed a commit to juanarbol/node that referenced this pull request Mar 8, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
richardlau pushed a commit to juanarbol/node that referenced this pull request Mar 16, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: nodejs#35891
Fixes: nodejs#35844
Refs: nodejs#35886
Refs: nodejs#35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: nodejs#37165
@richardlau richardlau mentioned this pull request Mar 18, 2021
MylesBorins pushed a commit that referenced this pull request Apr 6, 2021
The active worker check compared the time from sending message till
response arrived from worker with the complete time the worker was
running till it responses to the spin request.

If sending back the message is slow for some reason the test fails.

Adapt the test to compare the time seen inside the worker with the
time read from main thread.

PR-URL: #35891
Fixes: #35844
Refs: #35886
Refs: #35664
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>

Backport-PR-URL: #37163
@danielleadams danielleadams mentioned this pull request May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. test Issues and PRs related to the tests. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flaky test parallel/test-worker-eventlooputil
8 participants