Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse domains rather than spawning them for each parallel test #457

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

OlivierNicole
Copy link
Contributor

Given that domains are costly to spawn and join, and parallel tests in multicoretests do it many thousands of times, I wanted to look at it more closely. A quick perf profile on src/array/lin_tests.ml shows that more than 50 % of the time is spent in Domain.spawn. I wanted to try spawning domains only at the start of the test sequence and reusing them. Then I realised that this is exactly what Domainslib provides with domain pools!

This proof of concept PR simply replaces Domain.spawn with Domainslib.Task.async and Domain.join with Domainslib.Task.await. The approach implies to setup a domain pool and pass it to the tests. I have edited src/array/{lin,stm}_tests.ml to reflect this.

The results are rather encouraging: src/array/lin_tests.ml is cut down from about 4.5 s to 0.3 s! (With extremely far off statistical outliers due to bad luck in interleaving search.) However, this is a favourable case as the test’s commands are very cheap to run. The same comparison on my extensive Dynarray STM parallel test gives 24 s vs 12 s for the version with domain reuse, so merely a 2x speedup.

If depending on Domainslib is an issue, the mechanism can be reimplemented using atomics, mutexes and condition variables easily enough. (A reimplementation would likely squeeze some more performance out by not requiring a work-stealing queue.)

It looks like about 20 % of the time is now taken by Sys.cpu_relax, so further improvement may be possible still.

@OlivierNicole
Copy link
Contributor Author

I removed the dependency to Domainslib to use Stdlib mutexes. Since @jmid mentioned in a discussion that he wasn’t fond of the user having to wrap the test runner in a call to Util.Domain_pair.run (fun pair -> ...), I also tried to reuse domains only for intra-test repetitions, as he suggested. I measured the following run times (s):

src/array/lin_tests.ml src/array/stm_tests.ml
main (c65989d) 4.796 s ± 2.306 2.640 s ± 5.203
Intra-test reuse of domains 2.543 s ± 0.751 (speedup 1.89) 2.272 s ± 1.925 (speedup 1.16)
Maximal reuse of domains 716.9 ms ± 725.0 (speedup 6.67) 1.270 s ± 1.004 (speedup 2.08)

@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.

@OlivierNicole
Copy link
Contributor Author

(With some advice and help from @fabbing to switch to mutexes!)

@jmid
Copy link
Collaborator

jmid commented May 2, 2024

Thanks Olivier!
Quick observation: why does this cause so many Failure("failed to allocate domain") errors?
Overall, this should allocate less Domains - not more. Are there conditions under which the spawned Domains are not joined? (that could explain why several tests are running out of them)

@OlivierNicole
Copy link
Contributor Author

Strange, it should indeed spawn strictly less domains.

Overall, this should allocate less Domains - not more. Are there conditions under which the spawned Domains are not joined?

Normally, no, the Util.Domain_pair.run function joins the domains before returning. I’ll try to take a look the next time I have some time on my hands.

@jmid
Copy link
Collaborator

jmid commented May 3, 2024

Thanks again for this! 🙏

  • From your numbers it seems we should be able to get a decent speed-up, even if we just retain the pool to the property.
  • I think I found the issue with running out of Domains: the property raises a QCheck exception on failure to report the observations. When doing so, I believe the exception meant that the underlying Domain's wheren't joined in the end.

@OlivierNicole
Copy link
Contributor Author

Well spotted, thanks!

There seems to be a lot of failures still, I’ll have a look at it on my spare time at some point…

@jmid
Copy link
Collaborator

jmid commented May 15, 2024

I think I figured this one out too: the hint was STM tests with non-trivial precond triggering failed to allocate domain.
Preconditions are checked before Domains running, and if the precondition fails, it is signaled to QCheck ... with an exception, thereby causing takedown not to run 🤷
With that in mind, it makes sense to run the precondition check first, before attempting pool init.

The opam install workflows will still fail, due to util.ml now requiring OCaml 5.

@jmid
Copy link
Collaborator

jmid commented May 15, 2024

@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.

There are a few, e.g., tests of Weak and Ephemeron where the state of the heap may affect the behaviour of a test run. Another example where Domain-reuse may bite is in tests of Domain.DLS, where the state is attached to the Domain itself.

I think we misunderstood eachother however:
Generally, I'm fond of self-contained properties, as they are central to reproduce errors and have shrinking work well.
Starting to depend on the state of reused Domains, challenges this, if the state of the underlying pool may start to influence things. From this POV it is preferable to decouple the testing of one triple from the next.

For end-users, and assuming the runtime is bug-free, Domain reuse may be less of a worry.
However, as we are foremost concerned with testing and helping ensure the correctness of the OCaml 5 runtime, I'm thinking of whether Domain-reuse should be optional... 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants