Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test runner parallelism #172

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lightmare
Copy link
Contributor

This is an attempt to more evenly distribute testing work among child processes.

The first thing I did was get rid of the longRunningWorker. These tests take ages to complete, so I don't understand what's the point of running them all in one process.

Then I limited the number of tasks assigned to each worker at a time. Most of the tests were loaded and then buffered in the supervisor process anyway, once pipe buffers were full.

@lightmare lightmare marked this pull request as draft August 24, 2021 08:56
@lightmare lightmare marked this pull request as ready for review August 24, 2021 10:20
@devsnek
Copy link
Member

devsnek commented Aug 24, 2021

so i gave this a few runs, and it seems to take an average of about seven seconds off my test run time. i also notice the ci time went up from 13 to 19 minutes. unless this is significantly faster for you i'd rather not merge it.

@lightmare
Copy link
Contributor Author

lightmare commented Aug 24, 2021

When you run it without slow tests, it sometimes finishes a few seconds sooner — in particular when the original distribution gives some workers less work (time-wise) they finish early and you lose parallelism near the end of a run.

With slow tests, for me the original takes ~58 mins (3+1 workers), whereas this PR takes ~40 mins (3 workers).

I'll need to investigate what I messed up, because something's not right about the timings.
First, the CI seems to have stalled for twice as long near the end than it did previously.
Second, this PR didn't even print correct final counters (72323 + 3412 < 75737).
Third, summing up total CPU time spent gives me <100 min for the original, and ~120 min for this PR.

Instead, run "slow" tests on all workers, after they're done with all
"fast" tests.
This has two effects:

1) Workers don't get assigned the same number of tests each, which could
    take different amounts of time, and so some workers ended too early,
    while others still had many tests queued up.

2) The supervisor process doesn't load almost all test sources at once,
    gobbling up 300MB of memory. Now it's less than 100MB.
@lightmare
Copy link
Contributor Author

Update: the issue with CI taking longer was probably caused by running 2 workers (i.e. 3 processes including the supervisor) on a VM with 2 processors. Trying to keep a short, but always nearly-full queue, introduced a lot of expensive context switches. I've fixed this by decreasing the queue length below which scheduleTests() loop in the supervisor wakes up.

The second issue (incorrect final counters) was a bug in this PR, fixed.

The third issue (more CPU time used) is in the engine or node itself, not in this PR. It happens on the main branch as well, which you can confirm by commenting this line in slowlist:

#built-ins/{encode,decode}URI?(Component)/**/*.js

and running:

time NUM_WORKERS=1 node --predictable test/test262/test262.js -- test/built-ins/encodeURI/ 
#######################
 engine262 Test Runner
 Detected 4 CPUs
 Not running on CI
#######################

[01:29|:   62|+   62|-    0|»    0] (0.00/s)

real	1m30.051s
user	1m29.123s
sys	0m1.185s

versus:

time NUM_WORKERS=3 node --predictable test/test262/test262.js -- test/built-ins/encodeURI/ 
#######################
 engine262 Test Runner
 Detected 4 CPUs
 Not running on CI
#######################

[00:50|:   62|+   62|-    0|»    0] (0.00/s)

real	0m51.201s
user	1m50.333s
sys	0m1.664s

When running the above I also measured time spent in some parts of the runner, and found that the 20-second increase in user time matches a 20-second increase in total process.hrtime spent in this call:

completion = realm.evaluateScript(test.contents, { specifier });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants