Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: BatcherImplTest.testThrottlingBlocking to check actual wait time #2285

Merged
merged 9 commits into from
Dec 6, 2023

Conversation

suztomo
Copy link
Member

@suztomo suztomo commented Dec 4, 2023

Fixes #1931 . BatcherImplTest.testThrottlingBlocking to check actual wait time, because I observe that Future.wait does not always throw TimeoutException in a timely manner. With the additional logging, we observed the following failure:

[INFO] Running com.google.api.gax.batching.BatcherImplTest
2023 1:24:02.786 com.google.api.gax.batching.BatcherImplTest testThrottlingBlocking INFO: Calling future.get(10 ms)
2023 1:24:02.841 com.google.api.gax.batching.BatcherImplTest lambda$testThrottlingBlocking$0 INFO: Calling flowController.release
2023 1:24:02.841 com.google.api.gax.batching.BatcherImplTest lambda$testThrottlingBlocking$0 INFO: Called flowController.release
2023 1:24:02.843 com.google.api.gax.batching.BatcherImplTest testThrottlingBlocking INFO: future.get(10 ms) unexpectedly returned.
...
[ERROR] Failures: 
[ERROR]   BatcherImplTest.testThrottlingBlocking:905 adding elements to batcher should be blocked by FlowControlled

(See the future.get(10 ms) returned after 57 ms.)

It's probably because the main thread does not always get the CPU resource. This pull request fixes the flaky test failure by checking the actual wait time in the main thread. If it's taking more than 10 milliseconds, then it means the method future.get(10 ms) does not return within 10 milliseconds (Good).

@suztomo suztomo requested a review from a team as a code owner December 4, 2023 17:51
@product-auto-label product-auto-label bot added the size: s Pull request size is small. label Dec 4, 2023
@suztomo suztomo changed the title test: additional logs in BatcherImplTest test: BatcherImplTest.testThrottlingBlocking to check actual wait time Dec 5, 2023
@suztomo suztomo added the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 5, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 5, 2023
@suztomo suztomo added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Dec 5, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 5, 2023
@@ -93,6 +93,14 @@
</excludes>
</configuration>
</plugin>
<plugin>
<!-- Troubleshooting a flaky test in https://github.com/googleapis/sdk-platform-java/issues/1931 -->
Copy link
Member Author

@suztomo suztomo Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep this just in case the problem occurs again and we have the logging. This surefire (unit test) configuration does not affect the released libraries to users.

@suztomo suztomo requested a review from blakeli0 December 5, 2023 21:10
@blakeli0 blakeli0 requested a review from lqiu96 December 5, 2023 21:30
} catch (InterruptedException e) {
}
});

try {
logger.info("Calling future.get(10 ms)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with Batcher or this test, but what is the significance of 10ms here? If a TimeoutException is expected, why can't we lower the future.get() block time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the log (in the pull request description), the problem is that the TimeoutException isn't always thrown. Lowering the 10 ms to smaller number won't help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I read the original comment might not throw TimeoutException in a timely manner. which made me think that this was it wasn't thrown fast enough because .get(...) mau return in sub 10ms.

For my understanding, do you know what causes this: thread calling "future.get" was not being run in a timely manner. Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just potentially a scheduling issue on a machine that has a single CPU/ few CPUs?

Yes.

Copy link
Contributor

@lqiu96 lqiu96 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, from my understanding it would be either case:

  • future.get() takes more than 10ms to return -> TimeoutException
  • future.get() returns a result -> Check that the actual waitTime was at least 10ms

I think that should suffice due to the variance in scheduling. CC: @blakeli0 Do you know of another option to handle this flakiness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future.get() returns a result within 10ms

No, I didn't observe that. What I observed in #1931 was "future.get() took more than 10ms to return normally, without TimeoutException".

Copy link
Contributor

@lqiu96 lqiu96 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, updated the comment above. The docs for Future.get() seem to suggest that it will throw the exception once 10ms passes, but I'm guessing that is just best effort for limited CPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct.

Copy link
Contributor

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Dec 6, 2023
@suztomo suztomo requested a review from lqiu96 December 6, 2023 15:25
@suztomo suztomo added the automerge Merge the pull request once unit tests and other checks pass. label Dec 6, 2023
Copy link

sonarcloud bot commented Dec 6, 2023

[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link

sonarcloud bot commented Dec 6, 2023

[java_showcase_integration_tests] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@suztomo suztomo added the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 6, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 6, 2023
@suztomo suztomo removed the automerge Merge the pull request once unit tests and other checks pass. label Dec 6, 2023
@suztomo suztomo merged commit c5d5386 into googleapis:main Dec 6, 2023
34 checks passed
@suztomo suztomo deleted the i1931_log branch December 6, 2023 19:38
lqiu96 pushed a commit that referenced this pull request Dec 8, 2023
#2285)

* test: additional logs in BatcherImplTest

* Replaced the assertion with actual wait time count
suztomo added a commit that referenced this pull request Feb 28, 2024
This is the follow-up of a pull request #2285, where we wanted to
catch any flaky test failure with logging, just in case the test
failed again.

https://github.com/googleapis/sdk-platform-java/pull/2285/files#r1416300479

It turned out that the fix worked well. We don't observe the test
failures any more. Therefore turning off the excessive logging.
lqiu96 pushed a commit that referenced this pull request Feb 28, 2024
This is the follow-up of a pull request #2285, where we wanted to
catch any flaky test failure with logging, just in case the test
failed again.

https://github.com/googleapis/sdk-platform-java/pull/2285/files#r1416300479

It turned out that the fix worked well. We don't observe the test
failures any more. Therefore turning off the excessive logging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: s Pull request size is small.
Projects
None yet
3 participants