Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider lowering runTest default timeout #3270

Closed
JakeWharton opened this issue Apr 30, 2022 · 16 comments
Closed

Consider lowering runTest default timeout #3270

JakeWharton opened this issue Apr 30, 2022 · 16 comments
Assignees
Labels

Comments

@JakeWharton
Copy link
Contributor

60 seconds is a really long time for nothing to happen. It's like an eternity for a CPU, especially one which is skipping delays. I would personally prefer something low like 2s (as is the default for JS's Mocha runner) but something like 10s seems a reasonable default that it covers 99.9% of real-world usage without also being a whole minute.

@dkhalanskyjb dkhalanskyjb self-assigned this May 11, 2022
@dkhalanskyjb
Copy link
Collaborator

The reason for adding a timeout at all was to make sure that the CI doesn't hang when it's obvious that the test won't be passing. If someone runs a test manually and observes it hanging, they can just interrupt it.

Note also that this is not a timeout for the whole test, it's only a timeout for nothing happening in dispatchers that can be observed by the test framework. We won't throw, for example, on something like this:

runTest {
    while (true) {
        delay(1000)
    }
}

So, this timeout is not an indication of the test taking too long, it's an indication of no observable activity, possibly indicating a deadlock. So, you're right that a whole minute is like an eternity—this was the intention: if nothing observable happens a whole eternity later, the CI should fail.

10 seconds are also like an eternity, but only if we're CPU-bound. However, the things forked off to Dispatchers.IO (and thus hidden from the view of the framework) could be some things that rely on real world, like connecting to a database and performing some operations there. Not the type of tests we endorse, but the type of tests that actually exist.

@JakeWharton
Copy link
Contributor Author

I'm sympathetic to the idea that there exists someone writing a unit test which does >10 seconds of wall-clock work, but isn't that why the parameter exists? That outlier can specify a longer timeout since they know they're doing a huge batch DB operation.

@dkhalanskyjb
Copy link
Collaborator

The worry was that this is an occasional outlier, which will fit in 10 seconds easily 99% of the time, but then will flake due to some connection being surprisingly slow. We expect 60 seconds to be enough for everyone.

This cutoff is necessarily arbitrary, so this is not a hard stance, of course. A question though: who would benefit from tests failing in 10 seconds instead of 60?

  • CI: ± a minute shouldn't be a big deal there, right?
  • Manually running a single test: (can only base this on my personal experience, as, again, this is arbitrary) if I see a test taking more than, say, 4-5 seconds, I will already be prompted to interrupt it.
  • Manually running a test batch: here, I would probably prefer the hanging tests to take a minute and not ten seconds. This way, even with the dispersed attention, I will without a doubt notice which test hangs.

@dkhalanskyjb
Copy link
Collaborator

Now that I think about it, I'm in the wrong in the "manually running a test batch" scenario, as the test will fail and crash, which will be easy to notice. Is this the scenario that you think suffers due to a large timeout?

@hrach
Copy link
Contributor

hrach commented May 11, 2022

Manual testing is not only about running the test, but also about compilation, which may take few tens of seconds (especially with Gradle [configuration|build] cache-miss). So I usually switch context and wait for the IDE notification what is the test result. This happened to me multiple times.

@JakeWharton
Copy link
Contributor Author

JakeWharton commented May 13, 2022

I generally prefer a lower timeout for our CI. When I inadvertantly break something in a way that omits an emission each test will hang until timeout. Turbine used to have a 1 second timeout, but we removed it in the latest version in favor of the one in runTest. However, that means we've 60x'd the time it takes for CI to fail and report the test results should a hang be introduced. We have classes with anywhere from 5-30 functions testing the same subject. That means our potential wall-clock time for CI timeout has gone from 5-30 seconds to 5 to 30 minutes. Locally I'll manually stop the test suite (if I'm manually running it) when a single test hangs for more than a few seconds, but on CI we often need to see the test report which only is generated if the tests complete normally (i.e., are not canceled).

@dkhalanskyjb
Copy link
Collaborator

These are all good points: if we treat the timeout in runTest as an indicator that the test must have hanged, lowering the timeout sounds like a good idea: a test that takes 10 seconds to run probably has hanged, and if there's an option to raise this limit for the atypical case, everyone wins.

However, our timeout is for a very specific case when there are no emissions for a long time. For example, it will not catch this typical bug:

runTest {
    var x = 0
    launch {
        delay(1000)
        x = 0
    }
    while (x == 0) {
        delay(100)
    }
}

Since the test always does something, runTest will not fail it. A more general and reliable way to catch hanging tests is to use something like https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-debug/kotlinx.coroutines.debug.junit4/-coroutines-timeout/index.html, designed specifically for this purpose, or at least to wrap the test body in a withTimeout, which, while crude, is still more reliable.

What worries me is that, if we make the timeout low, people may get the impression that runTest provides some proper guarantees about timing out, whereas, in fact, its purpose is mostly that of a last-resort mechanism, which happens to cover a lot of typical cases of hanging tests.

@JakeWharton
Copy link
Contributor Author

Yeah that makes sense.

Given how runTest is used, however, have you considered maybe making it a proper wall-clock timeout? It won't cover 100% of tests since some may be synchronous, but it will cover a lot of them–especially ones which are prone to hanging.

@BraisGabin
Copy link

I would be happy if I could configure it from a property or something similar. I'm moving from runBlokingTest to runTest and I'm getting really odd issues. I want a report from JUnit about which tests failed. If 100 tests hang that means that I need 100 minutes to get that report!!

And once I have that report I need to go to each one, lower their timeout and start working on them to, once they are fixed, remove the timeout parameter because it just adds verbosity to my code.

I can undesrtand that 60 seconds is a really safe default value. But I don't want that default value because it slow my development flow a lot. And I don't want to add the tiemout value on each runTest because it adds a lot of verbosity. I could create my own runTest and forbid the usage of kotlinx.coroutines.test.runTest using something like detekt, but that adds complexity.

@dkhalanskyjb
Copy link
Collaborator

@BraisGabin, maybe https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-debug/kotlinx.coroutines.debug.junit4/-coroutines-timeout/index.html would help? runBlockingTest, too, didn't prevent tests from hanging.

@amal
Copy link

amal commented Jan 18, 2023

@dkhalanskyjb, @BraisGabin note that withTimeout right now will not work as expected inside the runTest (see #3588 for details).

@qwwdfsad
Copy link
Member

Will be fixed with 3b22c27

@qwwdfsad
Copy link
Member

The fix is available in 1.7.0-Beta

@mikekulasinski-kulsoft
Copy link

mikekulasinski-kulsoft commented May 25, 2023

Hello, probably quite late to the conversation but 10 seconds is extreemly low value, we have a mid size project consisting of 20+ gradle modules. When gralde parellism is turn on CPU is 100% busy with compiling and running CPU Thread * number of taks.

image

Our setup uses Koin and Mockk. When we just run a tests on a single module, bootstraping (loading all the mockk classes) takes around 2 seconds for the first test. It happens that we have many modules with 0-20 tests roughly in each of them.

When we run multiple tests on multiple modules the boostrap time extends to even 10-25 seconds! Just to answer upcoming quesitons, the rest of the tests take under 500ms. Please note the test reports hmtls are not done in the order they are run, so the second test in the report is reality the first test to run, so 12 seconds includes the bootstrapping (setting up the testing env) under super heavy load of CPU. Please note we observe the same problem on our local dev machines (Intel i9 12th gen, or Mac M2) as well as on Azure CI env.

image

We recently updated to:
kotlin = "1.8.21"
kotlinx-coroutines = "1.7.1"

and had to create a custom runTest method that overrides default runTest with 10 seconds time out back to 60 seconds. Any chance we could revert the time to 60 seconds? I consider it a breaking change since for a very long time the test dispatcher had time of 60 seconds.

I am interested in your view

Examples:

Simple test, with mockk and without

image

result no cpu load

image

result cpu load 100% (using stress test from s-tui)

image

@sampengilly
Copy link

I've had the same issue pop up since this change occurred, but more flaky (i.e. it wouldn't happen on local dev machines where the first test would only take maybe 5 seconds, but it would occasionally take longer than 10 seconds on CI).

There seems to be a discussion active about the slowness of ByteBuddy installation here: mockk/mockk#13

I'm electing to perform my mock setup outside of the call to runTest {} so that the first mock initialisation doesn't count towards a timeout. It's a little disappointing not being able to use the expression body syntax safely if I'm using mocks, but it seems a bit better than changing the timeout for all calls to runTest

@PaulWoitaschek
Copy link
Contributor

I created a follow up ticket to solve the issues this brings here: #3800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants