-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve test runner parallelism #172
base: main
Are you sure you want to change the base?
Conversation
70f7497
to
20b2e2d
Compare
so i gave this a few runs, and it seems to take an average of about seven seconds off my test run time. i also notice the ci time went up from 13 to 19 minutes. unless this is significantly faster for you i'd rather not merge it. |
When you run it without slow tests, it sometimes finishes a few seconds sooner — in particular when the original distribution gives some workers less work (time-wise) they finish early and you lose parallelism near the end of a run. With slow tests, for me the original takes ~58 mins (3+1 workers), whereas this PR takes ~40 mins (3 workers). I'll need to investigate what I messed up, because something's not right about the timings. |
Instead, run "slow" tests on all workers, after they're done with all "fast" tests.
This has two effects: 1) Workers don't get assigned the same number of tests each, which could take different amounts of time, and so some workers ended too early, while others still had many tests queued up. 2) The supervisor process doesn't load almost all test sources at once, gobbling up 300MB of memory. Now it's less than 100MB.
5d64d94
to
950f338
Compare
Update: the issue with CI taking longer was probably caused by running 2 workers (i.e. 3 processes including the supervisor) on a VM with 2 processors. Trying to keep a short, but always nearly-full queue, introduced a lot of expensive context switches. I've fixed this by decreasing the queue length below which The second issue (incorrect final counters) was a bug in this PR, fixed. The third issue (more CPU time used) is in the engine or node itself, not in this PR. It happens on the #built-ins/{encode,decode}URI?(Component)/**/*.js and running: time NUM_WORKERS=1 node --predictable test/test262/test262.js -- test/built-ins/encodeURI/
versus: time NUM_WORKERS=3 node --predictable test/test262/test262.js -- test/built-ins/encodeURI/
When running the above I also measured time spent in some parts of the runner, and found that the 20-second increase in engine262/test/test262/test262.js Line 308 in 8993f98
|
This is an attempt to more evenly distribute testing work among child processes.
The first thing I did was get rid of the
longRunningWorker
. These tests take ages to complete, so I don't understand what's the point of running them all in one process.Then I limited the number of tasks assigned to each worker at a time. Most of the tests were loaded and then buffered in the supervisor process anyway, once pipe buffers were full.