Reuse domains rather than spawning them for each parallel test #457

OlivierNicole · 2024-04-26T09:53:20Z

Given that domains are costly to spawn and join, and parallel tests in multicoretests do it many thousands of times, I wanted to look at it more closely. A quick perf profile on src/array/lin_tests.ml shows that more than 50 % of the time is spent in Domain.spawn. I wanted to try spawning domains only at the start of the test sequence and reusing them. Then I realised that this is exactly what Domainslib provides with domain pools!

This proof of concept PR simply replaces Domain.spawn with Domainslib.Task.async and Domain.join with Domainslib.Task.await. The approach implies to setup a domain pool and pass it to the tests. I have edited src/array/{lin,stm}_tests.ml to reflect this.

The results are rather encouraging: src/array/lin_tests.ml is cut down from about 4.5 s to 0.3 s! (With extremely far off statistical outliers due to bad luck in interleaving search.) However, this is a favourable case as the test’s commands are very cheap to run. The same comparison on my extensive Dynarray STM parallel test gives 24 s vs 12 s for the version with domain reuse, so merely a 2x speedup.

If depending on Domainslib is an issue, the mechanism can be reimplemented using atomics, mutexes and condition variables easily enough. (A reimplementation would likely squeeze some more performance out by not requiring a work-stealing queue.)

It looks like about 20 % of the time is now taken by Sys.cpu_relax, so further improvement may be possible still.

OlivierNicole · 2024-05-01T22:56:49Z

I removed the dependency to Domainslib to use Stdlib mutexes. Since @jmid mentioned in a discussion that he wasn’t fond of the user having to wrap the test runner in a call to Util.Domain_pair.run (fun pair -> ...), I also tried to reuse domains only for intra-test repetitions, as he suggested. I measured the following run times (s):

	`src/array/lin_tests.ml`	`src/array/stm_tests.ml`
main (`c65989d`)	4.796 s ± 2.306	2.640 s ± 5.203
Intra-test reuse of domains	2.543 s ± 0.751 (speedup 1.89)	2.272 s ± 1.925 (speedup 1.16)
Maximal reuse of domains	716.9 ms ± 725.0 (speedup 6.67)	1.270 s ± 1.004 (speedup 2.08)

@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>

OlivierNicole · 2024-05-02T07:59:20Z

(With some advice and help from @fabbing to switch to mutexes!)

jmid · 2024-05-02T08:25:43Z

Thanks Olivier!
Quick observation: why does this cause so many Failure("failed to allocate domain") errors?
Overall, this should allocate less Domains - not more. Are there conditions under which the spawned Domains are not joined? (that could explain why several tests are running out of them)

OlivierNicole · 2024-05-02T09:31:58Z

Strange, it should indeed spawn strictly less domains.

Overall, this should allocate less Domains - not more. Are there conditions under which the spawned Domains are not joined?

Normally, no, the Util.Domain_pair.run function joins the domains before returning. I’ll try to take a look the next time I have some time on my hands.

jmid · 2024-05-03T15:07:52Z

Thanks again for this! 🙏

From your numbers it seems we should be able to get a decent speed-up, even if we just retain the pool to the property.
I think I found the issue with running out of Domains: the property raises a QCheck exception on failure to report the observations. When doing so, I believe the exception meant that the underlying Domain's wheren't joined in the end.

OlivierNicole · 2024-05-15T09:59:40Z

Well spotted, thanks!

There seems to be a lot of failures still, I’ll have a look at it on my spare time at some point…

jmid · 2024-05-15T14:49:22Z

I think I figured this one out too: the hint was STM tests with non-trivial precond triggering failed to allocate domain.
Preconditions are checked before Domains running, and if the precondition fails, it is signaled to QCheck ... with an exception, thereby causing takedown not to run 🤷
With that in mind, it makes sense to run the precondition check first, before attempting pool init.

The opam install workflows will still fail, due to util.ml now requiring OCaml 5.

jmid · 2024-05-15T16:38:41Z

@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.

There are a few, e.g., tests of Weak and Ephemeron where the state of the heap may affect the behaviour of a test run. Another example where Domain-reuse may bite is in tests of Domain.DLS, where the state is attached to the Domain itself.

I think we misunderstood eachother however:
Generally, I'm fond of self-contained properties, as they are central to reproduce errors and have shrinking work well.
Starting to depend on the state of reused Domains, challenges this, if the state of the underlying pool may start to influence things. From this POV it is preferable to decouple the testing of one triple from the next.

For end-users, and assuming the runtime is bug-free, Domain reuse may be less of a worry.
However, as we are foremost concerned with testing and helping ensure the correctness of the OCaml 5 runtime, I'm thinking of whether Domain-reuse should be optional... 🤔

OlivierNicole added 4 commits April 25, 2024 20:58

Avoid spawning domains for each parallel Lin test

4545c25

Avoid spawning domains in each STM test

5ef0843

Add Domainslib to dependencies of Lin and STM

f1b5805

Use a domain pair rather than a domain pool (faster?)

1d756ba

OlivierNicole and others added 4 commits May 2, 2024 09:58

Use Stdlib mutexes rather than depend on Domainslib

a28370a

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>

Reuse domains only for intra-test repetitions

6a83021

Remove dependency to Domainslib

9b0eeb5

Fix internal tests

30e65c2

OlivierNicole force-pushed the reuse_domains branch from f3b69f2 to 30e65c2 Compare May 2, 2024 07:58

jmid added 2 commits May 3, 2024 13:39

Inline Util.Domain_pair.run in STM and catch exception in property

f8956b7

Inline Util.Domain_pair.run in Lin and catch exception in property

19d8644

update test/cleanup_lin.ml and test/cleanup_stm.ml too

f49e813

Check precondition before allocating pool to ensure init-takedown

599c67d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse domains rather than spawning them for each parallel test #457

Reuse domains rather than spawning them for each parallel test #457

OlivierNicole commented Apr 26, 2024

OlivierNicole commented May 1, 2024

OlivierNicole commented May 2, 2024

jmid commented May 2, 2024

OlivierNicole commented May 2, 2024

jmid commented May 3, 2024

OlivierNicole commented May 15, 2024

jmid commented May 15, 2024

jmid commented May 15, 2024

Reuse domains rather than spawning them for each parallel test #457

Are you sure you want to change the base?

Reuse domains rather than spawning them for each parallel test #457

Conversation

OlivierNicole commented Apr 26, 2024

OlivierNicole commented May 1, 2024

OlivierNicole commented May 2, 2024

jmid commented May 2, 2024

OlivierNicole commented May 2, 2024

jmid commented May 3, 2024

OlivierNicole commented May 15, 2024

jmid commented May 15, 2024

jmid commented May 15, 2024