Test Runner is slow #47663

egoroof · 2023-04-21T23:37:02Z

Version

18.16.0 same with 20.0.0

Platform

Ubuntu 22.04.2 and Windows 11

Subsystem

test_runner

What steps will reproduce the bug?

Run some simple tests with Nodejs Test Runner. For example:

git clone https://github.com/egoroof/browser-id3-writer.git
cd browser-id3-writer
npm ci
npm run build
node --test --test-reporter spec test/

How often does it reproduce? Is there a required condition?

always

What is the expected behavior? Why is that the expected behavior?

I expect native test runner to run at least as Mocha speed or even faster.

What do you see instead?

duration_ms 1977 when on same tests Mocha runs in 38ms (50x faster, yeah)

Additional information

I also tested it on nodejs 20.0.0 and got same results. Same on Windows 11 and Ubuntu 22.04.2

I was happy to drop Mocha as a dependency, but was not expect to such a slow down.

You can see the switch from Mocha to native Test Runner in this commit egoroof/browser-id3-writer@6d29a06

The text was updated successfully, but these errors were encountered:

meyfa · 2023-04-22T10:57:14Z

Just my educated guess as someone who's been watching test_runner development from the outside:

The overhead likely comes from Node.js spawning many child processes, one per test file ¹. Meanwhile, Mocha is running all tests in the same process.

I generated a performance profile ² like this (in Bash on Linux):

$ node --prof --test --test-reporter spec test/ &
$ echo $!
11456 # PID of main process
$ node --prof-process isolate-*-11456-v8.log > processed_main.txt
$ node --prof-process isolate-*-11616-v8.log > processed_test_file.txt

There were 20 isolate-*.log files generated in total, and your project has 19 test files. The main profile has the following summary:

 [Summary]:
   ticks  total  nonlib   name
      6    0.3%    0.4%  JavaScript
   1541   82.4%   97.5%  C++
    479   25.6%   30.3%  GC
    290   15.5%          Shared libraries
     34    1.8%          Unaccounted

and the one sample test file:

 [Summary]:
   ticks  total  nonlib   name
      0    0.0%    0.0%  JavaScript
    844   77.0%  100.0%  C++
      6    0.5%    0.7%  GC
    252   23.0%          Shared libraries

In both cases it's almost entirely C++. In the C++ section of the profiles, __write is always dominating (total 43% in main and total 62% in child process), followed by __lll_lock_wait. Unfortunately I have no idea how to interpret the results any further since I'm not familiar with the C++ code.

cjihrig · 2023-04-22T14:46:49Z

There is indeed some overhead here from using child processes. You can see mocha slow down on this particular test suite if you run it with --parallel, which is more similar to what Node's test runner does.

The more interesting thing is that there was a huge slowdown on this suite between Node 18.12.1 and 18.13.0. In 18.12.1, the performance is on par with mocha --parallel. v18.13.0 is visibly slower. 18.13.0 had a few test runner changes, but I would be willing to bet that the change that introduced the TAP parser is the cause of the slowdown.

I think we should confirm if that is indeed the culprit. If it is, we can try to optimize the existing code. It would also be worth looking at removing the TAP parser and lexer completely and using a different serialization format such as JSON or even V8's serializer. I think we could serialize the reporter events directly to the parent process (minus the Error object in the test:fail event) more quickly than TAP. Unfortunately, changing that format would be a breaking change because 334bb17 publicly tied us to using TAP in this scenario.

egoroof · 2023-04-22T18:13:53Z

some overhead here from using child processes

so, is there some option not to use child processes and run all tests in the main thread ?

MoLow · 2023-04-22T20:45:53Z

this might very well be a duplicate of #47365,
besides that, I am +1 on the suggestion made by @cjihrig to report using v8 serialization (perhaps by using IPC)

cjihrig · 2023-04-23T14:29:36Z

this might very well be a duplicate of #47365

Was the cause of that issue introduced in 18.13.0? If not, then there is at least one other performance issue at play.

perhaps by using IPC

I would be hesitant to spawn the child processes with an IPC channel. That changes how some APIs work, and could lead to different behavior when a file is run with --test vs. without it.

MoLow · 2023-04-23T16:12:39Z

Was the cause of that issue introduced in 18.13.0? If not, then there is at least one other performance issue at play.

Yep, usage of SafePromiseAllSettledReturnVoid was added here #45214, and it was released at 18.13.0

I would be hesitant to spawn the child processes with an IPC channel. That changes how some APIs work, and could lead to different behavior when a file is run with --test vs. without it.

I am curios, what does it change? the idea was to use IPC to avoid breaking TAP

cjihrig · 2023-04-23T16:25:58Z

Yep, usage of SafePromiseAllSettledReturnVoid was added here #45214, and it was released at 18.13.0

That's great to hear.

I am curios, what does it change?

process.channel, process.connected, process.disconnect(), process.send(), and possibly other things that I'm forgetting.

the idea was to use IPC to avoid breaking TAP

I'm not 100% sure what you mean here, but I don't think we would break TAP. This would just be the communication protocol between the CLI runner and the child processes. It should never be visible to users unless they were spawning child processes with NODE_TEST_CONTEXT=child.

It's only really a breaking change because we have documented that NODE_TEST_CONTEXT=child outputs TAP. We could make it a non-breaking change by changing the CLI to use NODE_TEST_CONTEXT=some-value-other-than-child (actual value to be determined). In that case we could document that the new communication protocol is an internal implementation detail that does not follow semver. We should probably do that for v21 no matter what so that we aren't bound to a specific format forever.

egoroof · 2023-05-03T21:04:12Z

node 20.1.0 runs the tests in ~830ms, when node 20.0.0 runs ~2400ms on my local machine. Better, yeah. But still far away from Mocha (~30ms)

cjihrig · 2023-05-03T21:37:57Z

I've noticed a few things when running your test suite locally with node 20.1.0:

The time elapsed as reported by the test runners seems to be pretty different. I ran the test suites with the time command, and the elapsed time reported by Node's test runner seems to be much closer to what is reported by time.
I added a three second timeout in the tests. Node properly accounted for this, while mocha completely ignored it when reporting elapsed time.
When mocha runs all of the tests in a single process, it is indeed faster. According to time, mocha clocks in around 131ms on my machine, versus 175ms for Node's test runner. However, I don't think running the tests in the same process is a good fundamental design. For example, when I add a process.exit() into one of the test files, mocha exits and does not report anything.
When running mocha with --parallel, which is more similar to how Node's test runner works, time reports mocha being around 261ms, while Node is still at 175ms.

There is definitely still room for improvement in Node's test runner, but I don't think the numbers are quite as drastic as reported in the previous comment.

egoroof · 2023-05-03T21:58:00Z

ok

egoroof · 2023-05-03T22:16:45Z

fyi

node: v20.1.0

VoltrexKeyva added test_runner performance Issues and PRs related to the performance of Node.js. labels Apr 22, 2023

egoroof closed this as completed May 3, 2023

MoLow mentioned this issue May 4, 2023

test_runner: use v8.serialize instead of TAP #47867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Runner is slow #47663

Test Runner is slow #47663

egoroof commented Apr 21, 2023 •

edited

meyfa commented Apr 22, 2023

cjihrig commented Apr 22, 2023

egoroof commented Apr 22, 2023 •

edited

MoLow commented Apr 22, 2023

cjihrig commented Apr 23, 2023

MoLow commented Apr 23, 2023

cjihrig commented Apr 23, 2023

egoroof commented May 3, 2023

cjihrig commented May 3, 2023

egoroof commented May 3, 2023

egoroof commented May 3, 2023

Test Runner is slow #47663

Test Runner is slow #47663

Comments

egoroof commented Apr 21, 2023 • edited

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

meyfa commented Apr 22, 2023

Footnotes

cjihrig commented Apr 22, 2023

egoroof commented Apr 22, 2023 • edited

MoLow commented Apr 22, 2023

cjihrig commented Apr 23, 2023

MoLow commented Apr 23, 2023

cjihrig commented Apr 23, 2023

egoroof commented May 3, 2023

cjihrig commented May 3, 2023

egoroof commented May 3, 2023

egoroof commented May 3, 2023

egoroof commented Apr 21, 2023 •

edited

egoroof commented Apr 22, 2023 •

edited