test(perf): add benchmark for jest runner #2618

nicojs · 2020-11-16T19:09:37Z

This adds a performance test for a jest project. A big one named lighthouse. I want to focus on the jest runner in the near future, so thought it made sense to add a benchmark first.

nicojs · 2020-11-16T19:15:55Z

@Lakitna Still keeping high a.t.m. 😅

This adds a performance test for a jest project. A big one named lighthouse. I want to focus on the jest runner in the near future, so thought it made sense to add a benchmark first.

I still agree with the benefits of "dropping down" as specified in #2434. I might add some lower-level benchmark tests when I implement the improvements.

Lakitna · 2020-11-17T10:39:21Z

Sounds good :)

Let me know when you're ready for a comprehensive baseline on multiple concurrencies.

nicojs · 2020-11-17T12:41:31Z

It runs on GH actions now (took some time, because I had to use yarn to install this project. The joy of nodejs development.

It only mutates "lighthouse-core/audits/**/*.js", because it already takes 2h 18min on GH actions (thats a --concurrency 2), did a test yesterday on dev laptop with --concurrency 4, that took ~1h. This is a big one.

Running performance tests on lighthouse (matched with glob pattern "lighthouse")
(lighthouse) exec "/home/runner/work/stryker/stryker/packages/core/bin/stryker run"
lighthouse: 609.987ms last log message: 09:56:12 (3366) INFO ConfigReader Using stryker.conf.json
lighthouse: 1:00.878 (m:ss.mmm) last log message: Mutation testing 0% (elapsed: <1m, remaining: ~28m) 53/8988 tested (48 survived, 0 timed out)
lighthouse: 2:00.889 (m:ss.mmm) last log message: Mutation testing 1% (elapsed: ~1m, remaining: ~1h 8m) 150/8988 tested (108 survived, 0 timed out)
[...}
lighthouse: 2:15:43.324 (h:mm:ss.mmm) last log message: Mutation testing 96% (elapsed: ~2h 14m, remaining: ~4m) 8694/8988 tested (3547 survived, 13 timed out)
lighthouse: 2:16:53.325 (h:mm:ss.mmm) last log message: Mutation testing 98% (elapsed: ~2h 16m, remaining: ~2m) 8854/8988 tested (3597 survived, 13 timed out)
lighthouse: 2:18:01.703 (h:mm:ss.mmm) last log message:
lighthouse: 2:18:02.116 (h:mm:ss.mmm)
all tests: 2:18:02.116 (h:mm:ss.mmm)

This is what the report looks like:

nicojs · 2020-11-17T12:41:42Z

Sounds good :)

Let me know when you're ready for a comprehensive baseline on multiple concurrencies.

Yes, will do.

nicojs · 2020-11-18T19:35:53Z

@Lakitna it took me some work, but I think it's ready for benchmarking. You can run locally with

cross-env PERF_TEST_GLOB_PATTERN=lighthouse npm run perf

Here is a job that runs all performance tests: https://github.com/stryker-mutator/stryker/runs/1420244888?check_suite_focus=true

nicojs · 2020-11-19T09:29:35Z

Results with gh workflow:

angular-cli: 1:04.958
express: 20:49.812 
lighthouse: 2:38:14.446

bartekleon · 2020-11-19T10:00:38Z

perf/test/angular-cli/tsconfig.json

-    "noImplicitAny": true,
-    "noImplicitReturns": true,
-    "noImplicitThis": true,
+    "noImplicitAny": false,


is it necessary to switch off these options?

Otherwise I think it is good :)

yeah, not really important since Stryker disables type checking. I tried to run the tests locally en then the compilation failed. The reason for that is probably because it is really old typescript code (stems from angular 4 days).

Lakitna · 2020-11-19T10:32:17Z

I tried to run, but I got a short runtime and a mutation score of 0%...

Something is going wrong here. Any ideas what's happening?

@ commit: c7d1ea2

Edit: Could it maybe be the plugin stuff? I'm on Windows after all.

Running performance tests on lighthouse (matched with glob pattern "lighthouse")
(lighthouse) exec "C:\Data\stryker\packages\core\bin\stryker run --plugins C:\Data\stryker\packages\mocha-runner\src\index.js,C:\Data\stryker\packages\karma-runner\src\index.js,C:\Data\stryker\packages\jest-runner\src\index.js,C:\Data\stryker\packages\jasmine-runner\src\index.js,C:\Data\stryker\packages\mocha-runner\src\index.js,C:\Data\stryker\packages\typescript-checker\dist\src\index.js"

Update: ~~The initial test run does not fail when I introduce a deliberate error. I guess there is an error reporting issue.~~

Update: I keep updating it seems. Turns out I introduced an error in the wrong test o.0 Initial test run does fail when it should.

nicojs · 2020-11-19T13:29:09Z

Edit: Could it maybe be the plugin stuff? I'm on Windows after all.

Running performance tests on lighthouse (matched with glob pattern "lighthouse")
(lighthouse) exec "C:\Data\stryker\packages\core\bin\stryker run --plugins C:\Data\stryker\packages\mocha-runner\src\index.js,C:\Data\stryker\packages\karma-runner\src\index.js,C:\Data\stryker\packages\jest-runner\src\index.js,C:\Data\stryker\packages\jasmine-runner\src\index.js,C:\Data\stryker\packages\mocha-runner\src\index.js,C:\Data\stryker\packages\typescript-checker\dist\src\index.js"

Might be an issue. I can try on windows as well in a few hours.

nicojs · 2020-11-19T13:59:01Z

Hmm I've got the same result as you did @Lakitna. I'm pretty sure this related to the way jest works on windows in combination with running the jest-runner from a different directory. Will look into it more, would be great if we can run the perf tests on windows as well.

nicojs · 2020-11-19T19:44:41Z

Works now since #2623

Hope that didn't break anything for others 🤷‍♂️

Lakitna · 2020-11-19T20:06:43Z

Hope that didn't break anything for others 🤷‍♂️

What could possibly go wrong 🤷‍♂️

Lakitna · 2020-11-19T21:22:51Z

I'm running concurrency 15, 12, 8, and 4 right now. I think that should do it.

I am doing it on my desktop this time though. With the long runtimes, it's easier. It's still an 8 core 16 thread CPU, this time a Ryzen 3700X. Just note that it might cause slight differences with the previous Express bench results.

Update: Only slightly related result. The memory thing I mentioned before is very apparent with this test suite. I still think it's not a performance issue, but it is notable.

Whelp, I spoke too soon. This most definitely is a performance issue. You're looking at CPU slowdowns because of a lack of memory. Probably an issue with the test suite, not Stryker.

nicojs · 2020-11-20T06:58:03Z

Whelp, I spoke too soon. This most definitely is a performance issue. You're looking at CPU slowdowns because of a lack of memory. Probably an issue with the test suite, not Stryker.

Maybe add --maxTestRunnerReuse 20 would help here? If you see a big difference then we know it has to do with something in the test suite.

Lakitna · 2020-11-20T08:57:33Z

Here are the results of this nights run. Ran at 1416611.

I made a mistake causing all runs to be at concurrency 15... Almost as if people are not very perceptive at night 🤭 On the bright side, we can see how stable default concurrency is.

Concurrency	% score	# killed	# timeout	# survived	# error	Avg tests/mutants	Duration
15 (default)	59.62	5185	172	3628	3	9.87	00:44:27
15 (default)	59.73	5180	187	3618	3	9.86	00:43:57
15 (default)	59.63	5194	164	3627	3	9.99	00:42:41
15 (default)	59.71	5179	186	3620	3	9.88	06:59:20 <- Machine went to sleep

All in all, it looks to be pretty stable. There are some differences, but they are within 0.1 percentage point.

I'm running different concurrencies now. This time for real, I double-checked.

Lakitna · 2020-11-20T13:45:05Z

Here are the results we actually need:

Concurrency	% score	# killed	# timeout	# survived	# error	Avg tests/mutants	Duration
15 (default)	59.63	5195	163	3627	3	10.01	00:43:29
12	59.60	5199	156	3630	3	10.11	00:45:57
8	59.59	5212	142	3631	3	10.58	00:54:41
7	59.57	5215	137	3633	3	10.98	00:59:20
6	59.21	5310	10	3665	3	21.17	01:07:24
5	59.21	5312	8	3665	3	21.33	01:18:03
4	~~59.22~~	~~5309~~	12	~~3664~~	3	~~21.32~~	~~02:24:17~~
4	59.21	5312	8	3665	3	21.33	01:30:19
3	59.21	5312	8	3665	3	21.33	02:04:57

Durations and test/mutants are interesting here. I'll run some missing concurrencies so I can make a duration graph as I did for Express. First impressions suggest a tipping point somewhere between concurrency 8 and 4. To that end, I'll run 7, 6, and 5 to fill in the gaps.

Scores seem very stable. Timeouts are manageable. All in all, it seems to be a lot more stable compared to the Express bench.

Lakitna · 2020-11-22T20:55:46Z

The results are in (previous comment), and stability is great! :) On all metrics but runtime...

The graph is the duration per concurrency. The concurrency 4 run is just such a weird outlier. It makes me think like there was an issue during 4 run or something. I'll run 4 and 3 to get some more data points.

Lakitna · 2020-11-23T15:54:53Z

I've updated the comment above once more. The 02:24:17 appears to have been a fluke. I image something like the Windows AntiMalwhere thingy running during the run.

I've updated the duration graph:

That looks a lot better :) The trend line actually fits this time!

nicojs · 2020-11-24T08:25:57Z

Wow! This is amazing. Thanks so much for taking the time to run this. Do you want me to put this graph in the readme? Then we should update it once I've implemented some improvements 😅

Lakitna · 2020-11-25T10:20:33Z

We can definitely use the data here to find out how much of a performance delta there is between changes :)

It would also be neat to show the basic relationship between concurrency and runtime.

If you're interested, I have a similar graph for the Mocha runner in the Express bench. It has more data points but shows the same relation between the two metrics.

Edit: It would also be neat to make a similar graph for the relation between mutant count and duration. However, that would require a pretty specialized test setup. Currently, I do not have the time to create such a setup.

nicojs · 2020-11-27T06:55:01Z

However, that would require a pretty specialized test setup. Currently, I do not have the time to create such a setup.

Do you mean that it would require a lot of scripting to automate it? That's true, and it would take a dedicated server to run. Right now we're using the free GH actions hardware, that won't do.

Lakitna · 2020-11-27T10:26:26Z

Hmm, not necessarily specialized hardware. We can run it on my machine as far as I'm concerned. Unless you want to automatically test for performance regression.

What I would like for this is a setup where we can test the same code with a variable number of mutations. To do that, however, we need a variable-sized codebase with a variable amount of tests tied to it. All so we can simulate a project growing. The metric for growth would be mutation count.

It's basically to find out how Stryker scales with the codebase it tests. If we want to find out how Stryker handles, all other changes must scale linearly.

Imagine results like:

Mutation count	Duration
100	xx:xx:xx
200	xx:xx:xx
300	xx:xx:xx
400	xx:xx:xx
500	xx:xx:xx
600	xx:xx:xx
...

It's not a simple implementation o.0

And we might even want to make it more complex by adding the mutation score variable:

Mutation count	Mutation score	Duration
100	50-ish%	xx:xx:xx
100	70-ish%	xx:xx:xx
100	90-ish%	xx:xx:xx
200	50-ish%	xx:xx:xx
200	70-ish%	xx:xx:xx
200	90-ish%	xx:xx:xx
...

bartekleon · 2020-11-27T22:15:18Z

I started "funny, small experiment" about these tables. With simple code:

const vals = []; for(let i = 0; i < 10000; i++) {
    vals.push(`export const test${i} = (a: number, b: number) => {
  return a + b;
};
`)
}

I am generating 20000 mutants, so i can make 100/1000/10000/100000 mutants and check the speed :)
But also I dunno how should we store these functions / tests.
In 1 file? In multiple files? 10k per file? 100 per file? All of these could have test length varying :/
(also test cases should be a lil harder than what I have done but for simple test should be enough :D)

EDIT: actually with 20000 mutants in 1 file (10000 test functions and tests), I managed to crash VSC 🗡️

Lakitna · 2020-11-30T09:32:57Z

That's a great start, but it also shows how many variables there are :) In an ideal world, you would isolate a single variable for these kinds of tests. That will take quite a lot of effort I'm afraid.

That being said. I'm interested in running this to see what kind of results we get. Can you share it in such a way that I can set the number of mutations on the command line? (e.g. set MUTATIONS=100 or --mutations=100) That way I can easily queue the runs and make them output to a file for later processing.

actually with 20000 mutants in 1 file (10000 test functions and tests), I managed to crash VSC 🗡️

Awesome, I'd also be interested to see what makes it crash! Lack of RAM I assume. It'd be interesting to find out what that means for large codebases. And (Typescript) codebases that bundle during transpilation?

bartekleon · 2020-11-30T11:25:56Z

That's a great start, but it also shows how many variables there are :) In an ideal world, you would isolate a single variable for these kinds of tests. That will take quite a lot of effort I'm afraid.

Yea, I am going to try single source/test file 100-500-1000-2500-5000-10000-20000, multifile - 100 mutants/tests per file, and random size - basically Math.random() with some tweaks :P

That being said. I'm interested in running this to see what kind of results we get. Can you share it in such a way that I can set the number of mutations on the command line? (e.g. set MUTATIONS=100 or --mutations=100) That way I can easily queue the runs and make them output to a file for later processing.

Sure, actually im thinking of making a repository for it so I could run everything at once HEHEHEHEHEHE

Awesome, I'd also be interested to see what makes it crash! Lack of RAM I assume. It'd be interesting to find out what that means for large codebases. And (Typescript) codebases that bundle during transpilation?

Yea, it seems that if you run it from VSC, VSC process gets more and more RAM usage (and actually leak I think XD) [I got over 8GB at the crash point], but from GIT I managed to get normal run without any significant RAM overflow - 11 runs up to 400mb per each

Lakitna · 2020-11-30T15:11:03Z

Yea, I am going to try single source/test file 100-500-1000-2500-5000-10000-20000, multifile - 100 mutants/tests per file, and random size - basically Math.random() with some tweaks :P

It'd be great if we can make a scatterplot with trendline for those results. Like the one above.

test(perf): add benchmark for jest runner

3498dc7

nicojs added 7 commits November 16, 2020 20:34

update package-lock of angular porject

3caf783

Respect the package manager when installing the perf tests

d89310f

Require correct stryker config

b9ec9f4

Rename install -> bootstrap

d42a0a2

Don't local install before perf testing

a5cfc53

Run stryker directly without npx

e2cd3c3

Plugin jest-runner

c8e6229

nicojs added 6 commits November 18, 2020 09:16

Don't install local deps for perf tests

a7e5e41

Update lighthouse config

90e3136

Remove install-local

0d6b79e

chore(perf): update angular deps

e127b80

Merge branch 'master' into test/add-jest-runner-perf-test

ef54a8c

Support install without package-lock

c7d1ea2

bartekleon reviewed Nov 19, 2020

View reviewed changes

Merge branch 'master' into test/add-jest-runner-perf-test

1416611

simondel approved these changes Nov 20, 2020

View reviewed changes

nicojs merged commit 5964d55 into master Nov 26, 2020

nicojs deleted the test/add-jest-runner-perf-test branch November 26, 2020 12:24

bartekleon mentioned this pull request Nov 30, 2020

Discussion: Performance testing #2434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(perf): add benchmark for jest runner #2618

test(perf): add benchmark for jest runner #2618

nicojs commented Nov 16, 2020 •

edited

nicojs commented Nov 16, 2020

Lakitna commented Nov 17, 2020

nicojs commented Nov 17, 2020

nicojs commented Nov 17, 2020

nicojs commented Nov 18, 2020 •

edited

nicojs commented Nov 19, 2020

bartekleon Nov 19, 2020

bartekleon Nov 19, 2020

nicojs Nov 19, 2020 •

edited

Lakitna commented Nov 19, 2020 •

edited

nicojs commented Nov 19, 2020

nicojs commented Nov 19, 2020

nicojs commented Nov 19, 2020

Lakitna commented Nov 19, 2020

Lakitna commented Nov 19, 2020 •

edited

nicojs commented Nov 20, 2020

Lakitna commented Nov 20, 2020

Lakitna commented Nov 20, 2020 •

edited

Lakitna commented Nov 22, 2020

Lakitna commented Nov 23, 2020

nicojs commented Nov 24, 2020

Lakitna commented Nov 25, 2020 •

edited

nicojs commented Nov 27, 2020

Lakitna commented Nov 27, 2020

bartekleon commented Nov 27, 2020 •

edited

Lakitna commented Nov 30, 2020 •

edited

bartekleon commented Nov 30, 2020

Lakitna commented Nov 30, 2020 •

edited

test(perf): add benchmark for jest runner #2618

test(perf): add benchmark for jest runner #2618

Conversation

nicojs commented Nov 16, 2020 • edited

nicojs commented Nov 16, 2020

Lakitna commented Nov 17, 2020

nicojs commented Nov 17, 2020

nicojs commented Nov 17, 2020

nicojs commented Nov 18, 2020 • edited

nicojs commented Nov 19, 2020

bartekleon Nov 19, 2020

Choose a reason for hiding this comment

bartekleon Nov 19, 2020

Choose a reason for hiding this comment

nicojs Nov 19, 2020 • edited

Choose a reason for hiding this comment

Lakitna commented Nov 19, 2020 • edited

nicojs commented Nov 19, 2020

nicojs commented Nov 19, 2020

nicojs commented Nov 19, 2020

Lakitna commented Nov 19, 2020

Lakitna commented Nov 19, 2020 • edited

nicojs commented Nov 20, 2020

Lakitna commented Nov 20, 2020

Lakitna commented Nov 20, 2020 • edited

Lakitna commented Nov 22, 2020

Lakitna commented Nov 23, 2020

nicojs commented Nov 24, 2020

Lakitna commented Nov 25, 2020 • edited

nicojs commented Nov 27, 2020

Lakitna commented Nov 27, 2020

bartekleon commented Nov 27, 2020 • edited

Lakitna commented Nov 30, 2020 • edited

bartekleon commented Nov 30, 2020

Lakitna commented Nov 30, 2020 • edited

nicojs commented Nov 16, 2020 •

edited

nicojs commented Nov 18, 2020 •

edited

nicojs Nov 19, 2020 •

edited

Lakitna commented Nov 19, 2020 •

edited

Lakitna commented Nov 19, 2020 •

edited

Lakitna commented Nov 20, 2020 •

edited

Lakitna commented Nov 25, 2020 •

edited

bartekleon commented Nov 27, 2020 •

edited

Lakitna commented Nov 30, 2020 •

edited

Lakitna commented Nov 30, 2020 •

edited