Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: navigationBar_multipleBackStackInterests might be flaky #1021

Open
3 tasks done
JoseAlcerreca opened this issue Nov 7, 2023 · 9 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@JoseAlcerreca
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Is there a StackOverflow question about this issue?

  • I have searched StackOverflow

What happened?

navigationBar_multipleBackStackInterests might be flaky. It tends to fail in API 30.

Example failure:
https://github.com/android/nowinandroid/actions/runs/6774060182/job/18410826425

Related: #1011

Relevant logcat output

> Task :app:connectedDemoDebugAndroidTest
Starting 26 tests on test(AVD) - 11

> Task :core:database:connectedDemoDebugAndroidTest

> Task :app:connectedDemoDebugAndroidTest

com.google.samples.apps.nowinandroid.ui.NavigationTest > navigationBar_multipleBackStackInterests[test(AVD) - 11] FAILED 
	kotlinx.coroutines.test.UncompletedCoroutinesError: After waiting for 10s, the test coroutine is not completing
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$2$1$2$1.invoke(TestBuilders.kt:349)

[DDMLIB]: An unexpected packet was received before the handshake.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@JoseAlcerreca JoseAlcerreca added the bug Something isn't working label Nov 7, 2023
@SimonMarquis
Copy link
Contributor

I'm not able to reproduce this flakiness on an emulator with the same API level.
Do you have access to the test reports (to pinpoint the actual error)? It seems like uploaded artifacts are removed when a workflow is re-run.

@JoseAlcerreca
Copy link
Contributor Author

I tried a quick fix here because it was making working on that PR very difficult.

@dturner
Copy link
Collaborator

dturner commented Nov 9, 2023

@SimonMarquis
Copy link
Contributor

@JoseAlcerreca Can you/I extract your fix in a dedicated PR and try to merge it and see if it fixes the issue?

@JoseAlcerreca
Copy link
Contributor Author

Merged in #1019

@dturner
Copy link
Collaborator

dturner commented Nov 11, 2023

I did some investigation into this today. The flaky test navigates to 4 screens and performs a scroll. It takes ~10s to run on CI. runTest's default timeout is 10s so in cases where it took >10s it would fail.

The reason why it doesn't fail locally is because your local machine is likely much more powerful than the CI machine so the emulator responds faster. On my M1 this test took about 7 seconds.

The reason for the 10s timeout is given by Jake Wharton here: Kotlin/kotlinx.coroutines#3270 with a big follow up discussion about the issues it causes here: Kotlin/kotlinx.coroutines#3800.

@dturner dturner reopened this Nov 11, 2023
@dturner
Copy link
Collaborator

dturner commented Nov 11, 2023

@JoseAlcerreca's fix works because it moves the test execution out of the runTest block but I'd like us to consider all the possible options.

10 seconds is a really long time for a test so this should be treated as an anomaly. Possible solutions:

  1. Avoid using runTest in any instrumented tests (not keen on this as it increases the overall test timeout to 60 seconds which is far too long)
  2. Override the default timeout value for runTest on a per-test basis, maybe with a constant used to easily identify these long running tests. If left up to the test author we should provide some guidance on what's a reasonable timeout e.g. for each new screen you're navigating to, allow 3 seconds
  3. Something clever to do with annotations, like @LargeTest

Any other ideas or thoughts?

@SimonMarquis
Copy link
Contributor

Good catch!

  1. Avoid using runTest in any instrumented tests (not keen on this as it increases the overall test timeout to 60 seconds which is far too long)

This will indeed restore the default behavior of whatever the code does, which for instrumented tests is generally driven by espresso "idling" mechanism and uses a 60s timeout. For Compose API, there should probably be the same, but I don't find any reference.
But there are no "overall" 60s timeout as you say baked into the Android test framework. If the test takes 10min (idle or not), the test will continue until finished.

  1. Override the default timeout value for runTest on a per-test basis, maybe with a constant used to easily identify these long running tests. If left up to the test author we should provide some guidance on what's a reasonable timeout e.g. for each new screen you're navigating to, allow 3 seconds

This might not really improve the situation, if it is left to choose by the dev, because most of the devs will have a more performant machine than what GitHub Actions offers for free. And everything would look fine on their end, they probably won't add a custom timeout, but then it will continue to fail sometimes on CI.

  1. Something clever to do with annotations, like @LargeTest

This would be very neat, but I think these are only meant for test selection/groupping with the JUnit framework. Maybe a custom test runner could help with detecting that, but this would be way more involving.


My personal opinion on this would be to:

  • either provide a kind of runNavigationTest(timeout: Duration) (similar to your 2.) with a mandatory/explicit timeout parameter that would ensure the test does not reach a maximum value. The only caveats to this is, how to enforce it? (Lint?)
  • or not use runTest() at all. And instead use runBlocking() calls where we need to invoke suspending functions. In these scenarios, we generally don't care about the delay-skipping feature anyway (reading a local repository).

@dturner
Copy link
Collaborator

dturner commented Nov 13, 2023

Good points. runBlocking seems like the best solution here - easy to understand and no magic delay skipping under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants