Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests get run multiple times when using maxParallelForks #3973

Open
mightyguava opened this issue Apr 17, 2024 · 25 comments · May be fixed by #4017
Open

Tests get run multiple times when using maxParallelForks #3973

mightyguava opened this issue Apr 17, 2024 · 25 comments · May be fixed by #4017
Assignees
Labels
framework 🏗️ Pertains to the core structure and components of the Kotest framework. question ❓ Inquiries or clarifications needed about the project.

Comments

@mightyguava
Copy link

mightyguava commented Apr 17, 2024

Kotest version: 5.8.1

This is one of the weirder bugs I've seen.

Repro here, containing a single commit on top of the kotest-examples-multiplatform repo main.

If you run ./gradlew cleanJvmTest jvmTest, you'll see that every test within the module is run twice. This is also visible from build/reports/tests/jvmTest/index.html. If maxParallelForks is set to 1, this issue does not occur.

If you copy the test case in WhenTest, each new copy of the test case will cause all tests to be run an additional time.

Based on trial and error, I narrowed this down to when clauses being added to the test that operate on an enum. Looking at the compiled classes, I suspect it has something to do with this $WhenMappings class that is generated, e.g.

build/classes/kotlin/jvm/test/io/kotest/examples/mpp/whendupe/BarTests$1$1$WhenMappings.class

Each test case that uses a when on an enum seems to generate a extra inner class... and my hypothesis is that Kotest's runner (or JUnit framework? or Gradle?) isn't deduping this with the outer class properly.

@Kantis Kantis added bug 🐛 Issues that report a problem or error in the code. framework 🏗️ Pertains to the core structure and components of the Kotest framework. labels Apr 23, 2024
@mightyguava
Copy link
Author

@Kantis could you provide some pointers on how to fix this issue? I'm happy to contribute if it's not too outside of my abilities.

@OliverO2
Copy link
Contributor

I wonder if using Gradle's maxParallelForks with Kotest makes sense. Kotest has its own configuration for parallelism, and can also be configured to use multithreading on the test case level. Kotest also has a way of disabling parallelism per spec (@DoNotParallelize) and therefore needs full control over multithreading.

Would using Kotest mechanisms exclusively work for you? If not, what's your use case and how would you expect the different parallelism configurations in Gradle and Kotest to work together?

@OliverO2 OliverO2 added question ❓ Inquiries or clarifications needed about the project. waiting-for-clarification ❔ Issues with this tag are waiting on the user to provide more information and removed bug 🐛 Issues that report a problem or error in the code. labels May 10, 2024
@mightyguava
Copy link
Author

mightyguava commented May 10, 2024

I wonder if using Gradle's maxParallelForks with Kotest makes sense. Kotest has its own configuration for parallelism, and can also be configured to use multithreading on the test case level.

We tried using Kotest parallelism but we depend on some third party libraries and also native C/Rust libraries that seem to use global state that we can't get around. It's particularly an issue in our integration tests where we instantiate our entire application.

Also, we do build for the iOS target and it doesn't look like @DoNotParallelize would work on iOS? Unclear to me whether it works on Android too.

Would using Kotest mechanisms exclusively work for you?
I don't think we can take advantage of kotest parallelism without significant work

If not, what's your use case and how would you expect the different parallelism configurations in Gradle and Kotest to work together?

I would expect maxParallelForks to tell kotest to arbitrarily shard Specs to forks, without regard to how long the tests to run. I'd then expect Kotest's internal parallelism to schedule specs mapped to the local fork based on availability of the threads.

So far Kotest behaves as expected with maxParallelForks = CPUs/2 and default kotest parallelism. The only time we run into issues is with when (enum) .

@mightyguava
Copy link
Author

For clarity, I should add: the bug isn't that the all tests in the project are being run once for every fork. The bug is that all tests in the project are being run once for every when (enum) statement.

@OliverO2
Copy link
Contributor

Let's examine your observations and assumptions point by point:

I suspect it has something to do with this $WhenMappings class that is generated, e.g. build/classes/kotlin/jvm/test/io/kotest/examples/mpp/whendupe/BarTests$1$1$WhenMappings.class

That class file is generated by the Kotlin compiler.

Each test case that uses a when on an enum seems to generate a extra inner class... and my hypothesis is that Kotest's runner (or JUnit framework? or Gradle?) isn't deduping this with the outer class properly.

I don't know in which cases the Kotlin compiler generates such a class, but yes, this could interfere with Gradle's way of invoking test engines (see below).

The Gradle docs on maxParallelForks say:

A value of N means that up to N test processes will be started to execute test classes.

In our case, this means that Gradle can start multiple Kotest engines, each in its own process. Depending on the timing, it could be one or more. Let's assume it is 2. Gradle would then probably (the documentation does not tell) supply some test classes (Kotest specs) to engine 1, and others to engine 2.

I would expect maxParallelForks to tell kotest to arbitrarily shard Specs to forks

This seems to be in line with what Gradle does.

I'd then expect Kotest's internal parallelism to schedule specs mapped to the local fork based on availability of the threads.

In general, yes. To clarify on "available threads": By default, Kotest runs each test on a single thread. On the JVM, it can create more, if so configured. Finally, your code test can create more threads.

So what you're seeing is determined by factors not under Kotest's control. If, for some reason, additional classes are generated and Gradle picks them up, Gradle will produce additional invocations.

We tried using Kotest parallelism but we depend on some third party libraries and also native C/Rust libraries that seem to use global state that we can't get around. It's particularly an issue in our integration tests where we instantiate our entire application.

Yes, if you're invoking code which is not thread-safe, single-threaded tests are mandatory. For integration and end-to-end tests, it is usually desirable to disable testing framework parallelism (@DoNotParallelize) and let the test function or component under test launch coroutines or create threads as necessary.

Also, we do build for the iOS target and it doesn't look like @DoNotParallelize would work on iOS? Unclear to me whether it works on Android too.

It is not relevant there: Kotest provides parallel test scheduling on the JVM only.

To summarize:

What you're observing seems to be an unfortunate interaction between Gradle and the Kotlin compiler. I don't see how Kotest could prevent this from happening.

So one option would be to run parallel tests using the Kotest mechanisms for thread-safe parts of your code base, and use @DoNotParallelize for the others.

If you still need to run several parallel test processes, maybe you could configure Gradle to exclude certain class patterns like so (untested)?

tasks.withType<Test>().configureEach {
    useJUnitPlatform()
    filter {
        excludeTestsMatching("*$WhenMappings")
    }
}

@OliverO2 OliverO2 removed the waiting-for-clarification ❔ Issues with this tag are waiting on the user to provide more information label May 11, 2024
@mightyguava
Copy link
Author

I would expect maxParallelForks to tell kotest to arbitrarily shard Specs to forks

This seems to be in line with what Gradle does.

I think there's still something missing here. If the hypothesis is that because Gradle is passing *$WhenMappings classes as additional test classes to Kotest's runner

  1. I'd expect Kotest to not use an anonymous inner class like $WhenMappings as a test entry point
  2. Even if it did, I'd expect Kotest to either fail or to run the specs in the parent class an extra time

What is actually happening is that all test classes in the project get run an extra time

test log
❯ ./gradlew cleanJvmTest jvmTest

> Task :jvmTest

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] PASSED

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] PASSED

io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] PASSED

io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=3, b=4, c=5)[jvm] PASSED

io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=6, b=8, c=10)[jvm] PASSED

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] PASSED

io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=3, b=4, c=5)[jvm] PASSED

io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=6, b=8, c=10)[jvm] PASSED

io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] PASSED

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] PASSED

What you're observing seems to be an unfortunate interaction between Gradle and the Kotlin compiler. I don't see how Kotest could prevent this from happening.

Something doesn't add up still. Here's my understanding based on my read of the Gradle code, but I could be way off:

  • Gradle DefaultTestExecuter will use DefaultTestClassScanner to provide the test Framework a list of test classes to run.
  • Gradle will fork test processes up to maxParallelForks while the DefaultTestExecutor shards the test list and sends them to workers
  • Kotest within the sharded worker processes receives an explicit list of test classes that it needs to run
  • Kotest in the sharded worker can then parallelize its subset of tests how it wants

I think there's something Kotest-specific here happening and I'd love to understand more how Kotest receives test classes from Gradle / JUnit Platform and then figure out which tests are run. I suspect something is getting lost going from JUnit => Kotest.

This all seems to work when $WhenMapping is not around. We observe significant speedup with maxParallelForks within a project using Kotest. I'm assuming that's due to Gradle's parallelization working as intended.

When $WhenMappings are around, instead of sharding, all tests get run by every Kotest worker. That behavior is really confusing to me and why I'm hoping to get more info here.

For integration and end-to-end tests, it is usually desirable to disable testing framework parallelism (@DoNotParallelize) and let the test function or component under test launch coroutines or create threads as necessary

Unfortunately, we are exactly trying to parallelize those integration tests because they are slow. We have no need to parallelize unit tests. Using @DoNotParallelize would defeat the purpose.

Other leads

  • When I initially filed this report, I made an identical repro for JUnit but the issues did not occur, which led me to post here instead of in Gradle.[

  • This JUnit PR Exclude anonymous class in JUnit Platform gradle/gradle#4774 from years ago introduced ignore for numbered anonymous classes to fix failures with generated classes on JUnit. A potential fix may be to include $WhenMappings on that regex. It still feels like something wrong is happening between the Kotest <=> Gradle integration though, and that the issue does not lie directly in Gradle or Kotlin.

mightyguava added a commit to mightyguava/gradle that referenced this issue May 11, 2024
mightyguava added a commit to mightyguava/gradle that referenced this issue May 11, 2024
@mightyguava
Copy link
Author

Hmm mightyguava/gradle@a29b234 does indeed seem to fix the problem, but I still don't understand why the problem manifests in this way.

@OliverO2
Copy link
Contributor

Shortly after my post above, I did some additional research and found this note in the Gradle docs:

With JUnit Platform, only includes and excludes are used to filter test classes — scanForTestClasses has no effect.

So seemingly they are saying "we are doing something different for JUnitPlatform, but we won't tell what exactly".

Digging into the JUnit Platform Launcher API unveals a huge API surface with lots of indirection (don't call us, we'll call you). That makes it hard to determine causes and effects. But we could be lucky in that Kotest has some say in filtering test classes and I could try to find out more as soon as time permits (probably not this month).

I think there's still something missing here. If the hypothesis is that because Gradle is passing *$WhenMappings classes as additional test classes to Kotest's runner

I'd expect Kotest to not use an anonymous inner class like $WhenMappings as a test entry point

That seems a bit of a stretch to expect Kotest to filter out test classes explicitly passed to it by the surrounding framework. Kotest also should not depend on Kotlin compiler internals such as names of synthetic classes. While Kotest could be made to filter out anything that contains a $, I wonder if it is ultimately safe to do so.

Even if it did, I'd expect Kotest to either fail or to run the specs in the parent class an extra time

I'm not so sure. If it is indeed Kotest selecting the test classes, it should have some way of determining what constitutes a legitimate test class. If not, the surrounding framework should bear responsibility for passing correct test classes only. We need to find out what's the case here before acting on this.

If you'd like to speed things up, you could explore Kotest's JUnitPlatform test runner which resides under kotest-runner/kotest-runner-junit5 and set breakpoints there. However, if Gradle forks extra processes, it will probably become difficult to attach an additional debugger in time.

Also, how did you verify how Gradle actually distributes your test across processes? I have used this code (but in my tests, Kotlin did not create the extra $WhenMappings class file).

class BarTests : FunSpec({
    println("BarTests: pid=${ProcessHandle.current().pid()}, thread=${Thread.currentThread().id}")

    test("bad when") {
        println("BarTests.bad when: pid=${ProcessHandle.current().pid()}, thread=${Thread.currentThread().id}")
        val num = Hello.A

        when (num) {
            Hello.A -> {}
            else -> fail("oops")
        }
    }

    test("bad when 2") {
        println("BarTests.bad when2: pid=${ProcessHandle.current().pid()}, thread=${Thread.currentThread().id}")
        val num = Hello.A

        when (num) {
            Hello.A -> {}
            else -> fail("oops")
        }
    }
})

@mightyguava
Copy link
Author

mightyguava commented May 11, 2024

Thank you for digging into the issue!

Even if it did, I'd expect Kotest to either fail or to run the specs in the parent class an extra time

I'm not so sure. If it is indeed Kotest selecting the test classes, it should have some way of determining what constitutes a legitimate test class. If not, the surrounding framework should bear responsibility for passing correct test classes only. We need to find out what's the case here before acting on this.

I agree. I've also done some digging and the indirection makes things quite difficult to trace.

The behavior seems strange for either hypothesis so this is extra strange to me.

If Kotest is receiving explicit test classes from Gradle to run, and it gets one named BarTests$1$1$WhenMappings.class, how is it successfully running tests for that? It must be doing something to figure out that BarTests is the real class that contains the tests. And what makes Kotest instead just run every test in the project one more time...?

If Kotest is not receiving explicit test classes from Gradle to run, then how does it know which classes to run in which process? If Kotest is determining which tests to run though, then it should be filtering out the WhenMappings classes.

Also, how did you verify how Gradle actually distributes your test across processes? I have used this code (but in my tests, Kotlin did not create the extra $WhenMappings class file).

Pushed mightyguava/gradle@a29b234 to my exemplar. Here's what it prints with ./gradlew --info cleanJvmTest jvmTest. It ran twice with different PIDs. As before, calling out that UUIDJvmTest that doesn't have a when is also running an extra time...

logs
io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=826, thread=74

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=825, thread=98

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=828, thread=74

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=825, thread=98

Adding a second test case like you did makes each test run 3 times on 3 different PIDs.

logs
io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=237, thread=74

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when 2[jvm] STANDARD_OUT
    BarTests.bad when: pid=237, thread=74

Gradle Test Executor 49 finished executing tests.
Gradle Test Executor 54 finished executing tests.
Gradle Test Executor 51 finished executing tests.
Gradle Test Executor 53 finished executing tests.
Gradle Test Executor 52 finished executing tests.

> Task :jvmTest

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=602, thread=74

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=602, thread=74

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when 2[jvm] STANDARD_OUT
    BarTests.bad when: pid=602, thread=74

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=597, thread=98

Gradle Test Executor 56 finished executing tests.
Gradle Test Executor 57 finished executing tests.
Gradle Test Executor 55 finished executing tests.
Gradle Test Executor 60 finished executing tests.
Gradle Test Executor 59 finished executing tests.

> Task :jvmTest

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=597, thread=98

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when 2[jvm] STANDARD_OUT
    BarTests.bad when: pid=597, thread=98

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=600, thread=98

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] STANDARD_OUT
    BarTests.bad when: pid=600, thread=98

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when 2[jvm] STANDARD_OUT
    BarTests.bad when: pid=600, thread=98

If I remove the when (enum) though, and move all the commonTest tests to jvmTest (so I can add the PID lookup), you can see that they also get distributed across PIDs but they do not get run multiple times

logs
> Task :jvmTest

io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] STANDARD_OUT
    UUIDTestCommon: pid=1067, thread=74

io.kotest.examples.mpp.data.DataDrivenTest[jvm] STANDARD_OUT
    DataDrivenTest: pid=1062, thread=74

io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] STANDARD_OUT
    UUIDJvmTest: pid=1063, thread=74

io.kotest.examples.mpp.data.DataDrivenTest[jvm] STANDARD_OUT
    DataDrivenTest: pid=1062, thread=74
Finished generating test XML results (0.0 secs) into: /Users/yunchiluo/Development/kotest-examples-multiplatform/build/test-results/jvmTest

Are you using my example repo? I always get a build/classes/kotlin/jvm/test/io/kotest/examples/mpp/whendupe/BarTests$1$1$WhenMappings.class when I run tests there. It's probably easiest to use the my example repo rather than copying the test elsewhere in case there are environment differences.

@OliverO2
Copy link
Contributor

So far, I've tried to investigate the case with a little test project, which I use to create reproducers. I always try to keep such stuff minimal. I could use your example project with a local Kotest build and investigate further, but doing so requires more time than I can make available this month.

I think your use case is legitimate and should be supported. If we can figure out the root cause, we can either fix it in Kotest or create an issue in whichever project is responsible. In the latter case, we might be able to create a workaround in Kotest.

In the meantime, I understand that you can work around the problem locally, which is a good thing. If you have more information, feel free to post here or even create a PR.

@OliverO2 OliverO2 self-assigned this May 11, 2024
@OliverO2
Copy link
Contributor

OliverO2 commented May 12, 2024

One additional factor to explore: How do Gradle or JUnitPlatform distribute test classes to parallel processes?

Assume we try to parallelize 100 test classes across 4 worker processes, with 97 light-load classes requiring 20% of total CPU time and 3 heavy-load classes requiring the remaining 80% (not unusual for integration tests). Using a static (pre-determined) distribution, we could end up with the 3 heavy-load classes being distributed to the same test worker (process). In that case, we'd parallelize 20% of our total load, leaving 80% on a single process.

For a generally useful parallelization, we would need some scheduler managing a (test class) job queue and schedule each test class to a worker engine (actor) once that becomes idle.

In other words: A static distribution of test cases across processes, which is determined in advance, would serve only a fraction of use cases compared to a dynamic scheduler/actor-based distribution.

Inside Kotest, with coroutines, we of course have that kind of scheduling for thread-based parallelization. It remains to be seen for what use cases the upstream frameworks offer additional value with process-based parallelization.

@OliverO2
Copy link
Contributor

@mightyguava I've tried your repro, at 2fa1069070d7a0cb9282c02b3f075816e0a41f04. This is what I got:

./gradlew cleanJvmTest jvmTest

> Task :jvmTest
io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] PASSED
io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=3, b=4, c=5)[jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=6, b=8, c=10)[jvm] PASSED
io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] PASSED

So the problem did not reproduce in my environment (OpenJDK Runtime Environment (build 17.0.10+7-Ubuntu-120.04.1)).

@OliverO2 OliverO2 added the waiting-for-clarification ❔ Issues with this tag are waiting on the user to provide more information label May 12, 2024
@mightyguava
Copy link
Author

mightyguava commented May 12, 2024

Hmm that's strange, I would not have expected this to be an architecture-specific problem. My environment is an Apple Silicon Macbook running

❯ java -version
openjdk version "17.0.10" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-17.0.10.7.1 (build 17.0.10+7-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.10.7.1 (build 17.0.10+7-LTS, mixed mode, sharing)

I tried this in docker with to get as close as I can to your env on my mac, on the same commit... Ubuntu Jammy with Eclipse Build of OpenJDK

> docker run -ti --platform=linux/amd64 --rm -v ./kotest-examples-multiplatform:/project eclipse-temurin:17.0.10_7-jre-jammy bash
# java --version
openjdk 17.0.10 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode, sharing)
# cd project
# ./gradlew cleanJvmTest jvmTest

and still reproduce the issue

> Task :jvmTest

io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] PASSED
io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] PASSED
io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=3, b=4, c=5)[jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=6, b=8, c=10)[jvm] PASSED
io.kotest.examples.mpp.UUIDJvmTest[jvm] > uuids should be in be type 4 format[jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=3, b=4, c=5)[jvm] PASSED
io.kotest.examples.mpp.data.DataDrivenTest[jvm] > PythagTriple(a=6, b=8, c=10)[jvm] PASSED
io.kotest.examples.mpp.uuid.UUIDTestCommon[jvm] > uuids should be somewhat unique![jvm] PASSED
io.kotest.examples.mpp.whendupe.BarTests[jvm] > bad when[jvm] PASSED

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You wouldn't happen to have a dual-core CPU would you? 😅 Wondering if the CPU count is the difference.

@OliverO2
Copy link
Contributor

OliverO2 commented May 12, 2024

The above tests ran on a machine with an Intel Xeon 4-Core E3-1225 v5. The interesting difference is that I still can find no $WhenMappings class files (and it is unlikely that the Kotlin compiler would consider the CPU it is running on). I could try on a Mac M2 though (I would bail out if you insisted on Windows 😆).

BTW: What I found is that JUnitPlatform seems to call io.kotest.runner.junit.platform.KotestJunitPlatformTestEngine#discover() which then lets io.kotest.framework.discovery.Discovery#doDiscovery() do the class selection. I did not see anything suspicious there. Basically, Kotest test classes must be (non-abstract) subclasses of Kotest's Spec class, which is determined for a Kotlin class it via Spec::class.java.isAssignableFrom(it.java).

@OliverO2 OliverO2 removed the waiting-for-clarification ❔ Issues with this tag are waiting on the user to provide more information label May 12, 2024
@OliverO2
Copy link
Contributor

OliverO2 commented May 13, 2024

Some updates:

  1. I found the $WhenMappings class file. It was there all the time, just IntelliJ decided to hide it in the project view I had been using.
  2. The reason that the problem did not reproduce on a 4-core machine could be exactly the differing number of CPU cores (assuming you have 8 or more):
    • Gradle/JUnitPlatform invokes each Kotest launcher with a subset of classes, which are not necessarily legitimate test classes.
    • In my experiment, it was this:
      • Launcher 1: io.kotest.examples.mpp.UUIDJvmTest
      • Launcher 2: io.kotest.examples.mpp.uuid.UUIDTestCommon
      • Launcher 3: io.kotest.examples.mpp.data.DataDrivenTest, io.kotest.examples.mpp.whendupe.BarTests
      • Launcher 4: io.kotest.examples.mpp.data.PythagTriple, io.kotest.examples.mpp.whendupe.Hello
    • Kotest will then filter out classes which are not test specs, so this type of invocation makes little sense for parallelization: Some launcher invocations can end up without any test class to process. In the above example, launcher 4 does nothing.
  3. I did some cleanup in the Kotest class discovery code, which you could try via a local Kotest build from https://github.com/kotest/kotest/tree/oo2/spec-discovery.
    • Using publishToMavenLocal on that build will make Kotest version 5.9.0-LOCAL available in your mavenLocal() repository.
    • You could then add systemProperty("KOTEST_DEBUG", "TRUE") to your tasks.named<Test>("jvmTest") block to make Kotest produce log files "kotest-PID.log" separated by PID, so that parallel invocations create reasonable output.

@mightyguava
Copy link
Author

Here's the "kotest-PID.log" files. I'm not totally sure how to interpret these logs but I do see the same tests being assigned to multiple PIDs.

kotest-logs.zip

@OliverO2
Copy link
Contributor

What happened is this:

  • There were 6 Kotest invocations by the Gradle test task via the JUnitPlatform API.
  • 5 of these were invocations with an explicit list of test class selectors which show up at the top of the logs like this: classSelectors=[io.kotest.examples.mpp.uuid.UUIDTestCommon].
  • One invocation (PID 38317) was passing an empty list of test class selectors (which is complete nonsense given the other invocations). In this case, Kotest scans its class path for test classes, finds them all and executes the tests in that invocation.
  • So each test is discovered (and run) twice: Once via its explicitly specified class selector, once via Kotest's classpath scan.

So the Gradle test task does the initial class discovery. It just picks up classes from the classpath. There is some hardwired code for some older test frameworks (and even pattern matching), but in the JUnitPlatform case, Gradle does not know anything about what constitutes a test class.

Now in our parallel execution case, Gradle intends to invoke the JUnitPlatform launcher API repeatedly, each with a LauncherDiscoveryRequest specifying a list of ClassSelectors (basically, a list of classes).

And here's the bug: The Gradle test task invokes the launcher API with an empty list of class selectors. The JUnitPlatform launcher API does not specify what to do in this case. Kotest's behavior is to do a full test discovery by scanning the classpath.

Your observation that a $WhenMapping class made a difference is pure coincidence. It is probably the number of classes divided by the number of parallel cores to use which triggers the bug (off by one?).

Furthermore, the entire API seems completely unsuitable for parallelization. With a new commit for better logging, I have observed this:

[Discovery] Collected specs via 29 class discovery selectors in 720ms, found 3 specs

So if the classpath contains 29 classes, of which only 3 (just 10%) are actual test specs, the chances of having the actual tests distributed correctly across multiple runners are slim.

So if we changed Kotest to assume an intentional no-op invocation with JUnitPlatform, this would not solve the problem. For proper parallelization, a better approach would be to ignore the Gradle test task, use a sharding algorithm tailored to your needs, and directly invoke KotestJunitPlatformTestEngine as demonstrated here:

test("kotest should support multiple selected class names") {
val req = LauncherDiscoveryRequestBuilder.request()
.selectors(
DiscoverySelectors.selectClass("com.sksamuel.kotest.runner.junit5.mypackage.DummySpec1"),
DiscoverySelectors.selectClass("com.sksamuel.kotest.runner.junit5.mypackage.DummySpec2")
)
.build()
val engine = KotestJunitPlatformTestEngine()
val descriptor = engine.discover(req, UniqueId.forEngine("testengine"))
descriptor.classes.map { it.qualifiedName } shouldBe listOf(
com.sksamuel.kotest.runner.junit5.mypackage.DummySpec1::class.java.canonicalName,
com.sksamuel.kotest.runner.junit5.mypackage.DummySpec2::class.java.canonicalName,
)
descriptor.children.map { (it.source.get() as ClassSource).javaClass } shouldBe listOf(
com.sksamuel.kotest.runner.junit5.mypackage.DummySpec1::class.java,
com.sksamuel.kotest.runner.junit5.mypackage.DummySpec2::class.java,
)
}

@OliverO2 OliverO2 changed the title Tests get run multiple times when test contains when clause with enums AND using maxParallelForks Tests get run multiple times when using maxParallelForks May 13, 2024
@OliverO2
Copy link
Contributor

Related: gradle/gradle#2669

@mightyguava
Copy link
Author

Thanks for the thorough investigation, the bug makes much more sense now!

So if we changed Kotest to assume an intentional no-op invocation with JUnitPlatform, this would not solve the problem.

It sounds like if we change Kotest to assume an intentional no-op invocation with JUnitPlatform, we

  • would solve the problem of tests running multiple times
  • would not solve the problem of static parallelization distributing tests to processes in a nonsense way

If it doesn't break Kotest, making this change would still be valuable because

  1. Running test multiple times, particularly tests that you want to parallelize, can cause confusion and a pretty big impact on test run times
  2. In a real world scenario, a module would have many more tests than in this contrived example. We would still get gains with process-level parallelization compared to single-threaded tests. At the very least, it shouldn't be worse.

What do you think?

@OliverO2
Copy link
Contributor

I agree mostly with your conclusions, though loss of performance seems possible as well, depending on the scenario.

Factors to consider are:

  1. We have an edge case:
    • Resource-intensive tests requiring process isolation running on a low-quality parallelization scheme vs.
    • Kotest's primary target of coroutines with non-blocking, fine-grained concurrency with dynamic multi-threading.
  2. We cannot expect the Gradle side to change in the foreseeable future. The issue with low-quality parallelization has been open for 6+ years although approaches have been suggested.
  3. KotestJunitPlatformTestEngine is part of the public API, so changing its behavior may break other people's code. AFAIK, there is no way to express "use tests from the current classpath" in a LauncherDiscoveryRequest other than to omit all of its selectors.

So we'd have to make this configurable and not the default behavior. The launcher API is not my area of expertise and I may not be aware of all the implications, so we need to get some consensus from the team before merging such changes.

I could try to push some commit to my branch, like introducing an engine property kotest.framework.discovery.classpath.scanning with a default value of true, which you could then use with a false value for further exploration.

Does that make sense?

@mightyguava
Copy link
Author

Yes, those considerations make sense to me. I agree that adding a property instead of enabling it as a default would be a safer option. As well as getting consensus from the team.

I could try to push some commit to my branch, like introducing an engine property kotest.framework.discovery.classpath.scanning with a default value of true, which you could then use with a false value for further exploration.

What is the behavior with maxParallelTasks = 1 (parallelization disabled)? Does Gradle pass all tests it found to LauncherDiscoveryRequest or does it omit selectors?

@OliverO2
Copy link
Contributor

What is the behavior with maxParallelTasks = 1 (parallelization disabled)? Does Gradle pass all tests it found to LauncherDiscoveryRequest or does it omit selectors?

It always passes all potential classes on the classpath (it does not know what a test is) via class path selectors, one per class. And if there are none, nothing bad will happen as then the Kotest classpath scan will yield the same result (0 classes).

@OliverO2
Copy link
Contributor

OliverO2 commented May 13, 2024

Could you try a local build of https://github.com/kotest/kotest/tree/oo2/spec-discovery, along with the following?

tasks.named<Test>("jvmTest") {
   // ...
   systemProperty("KOTEST_DEBUG", "TRUE")
   systemProperty("kotest.framework.discovery.classpath.scanning.enabled", "false")
}

@mightyguava
Copy link
Author

Yes, that seems to have solved the problem of tests running twice!

kotest-logs.zip

@OliverO2
Copy link
Contributor

Also works for me. Could reproduce the doubling of test executions via

tasks.named<Test>("jvmTest") {
   // ...
   forkEvery = 1
}

With that, one process will always get an empty class list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework 🏗️ Pertains to the core structure and components of the Kotest framework. question ❓ Inquiries or clarifications needed about the project.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants