Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Unary Callables Deadline values respect the TotalTimeout in RetrySettings #1603

Merged
merged 143 commits into from May 8, 2023

Conversation

lqiu96
Copy link
Contributor

@lqiu96 lqiu96 commented Apr 3, 2023

We should be able to cancel the runnable/ close the call exactly when the RPC timeout has elapsed.

Fixes: #1504

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Apr 3, 2023
@lqiu96 lqiu96 changed the title Main showcase retries fix: Unary Callables Deadline values respect the TotalTimeout in RetrySettings Apr 3, 2023
@lqiu96 lqiu96 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 3, 2023
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Apr 5, 2023
@lqiu96 lqiu96 removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 6, 2023
@lqiu96 lqiu96 requested a review from blakeli0 April 6, 2023 20:28
@lqiu96 lqiu96 marked this pull request as ready for review April 6, 2023 20:28
@lqiu96 lqiu96 requested a review from a team as a code owner April 6, 2023 20:28
@@ -88,6 +93,7 @@
private final ApiMethodDescriptor<RequestT, ResponseT> methodDescriptor;
private final HttpTransport httpTransport;
private final Executor executor;
private final ScheduledExecutorService deadlineCancellationExecutor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to reuse the executor above? I know it's not a ScheduledExecutorService at this point, but the what's being passed in is from here, which is indeed a ScheduledExecutorService, so we should be able to change the type and reuse it. I did a quick search and I don't think most of the change would be breaking, there is one public builder method that could be considered breaking.
The reason we may want to reuse the existing executor is that we can leverage the existing client lifecycles for shutting down the executor, otherwise we need to explicitly shut down this new deadlineCancellationExecutor somewhere to prevent from resource leaking, which is missing from this PR currently. Reusing the existing executor may also have other issues like the ThreadPool is not large enough so new requests may have to wait for new thread. The bottom line is that we need to be careful with anything related to executors, so we have enough resource for large number of concurrent requests and we don't leak resource accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, makes sense. I'll take another look into how grpc handles the multiple executors. I'd imagine they would also possibly run into this issue.

// there is a timeout exception from this RPC call (DEADLINE_EXCEEDED)
private void closeAndNotifyListeners() {
synchronized (lock) {
close(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment in close method makes me worried that we may still have to wait for the server to fully send the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the cancellation in the runnable doesn't seem like it's actually cancel's the runnable. I'll look into the possibility of being able to actually disconnect the connection instead of waiting for the socket timeout.

// will occur immediately). For any other value, the deadlineScheduler will
// terminate in the future (even if the timeout is small).
@InternalApi
protected boolean shouldRPCTerminate(long timeLeftMs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this new method now that we are using ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is here due the the very subtle difference between unary/streaming vs LRO retry logic. Polling from LRO's should retry even if timeLeftMs == 0 (as it would be the final polling attempt), but unary/streaming callables should not (and it would signify an rpc timeout attempt of 0)

@Test
public void testGRPC_LROSuccessfulResponse_NoDeadlineExceeded()
throws IOException, ExecutionException, InterruptedException {
EchoStubSettings.Builder grpcEchoSettingsBuilder = EchoStubSettings.newBuilder();
Copy link
Collaborator

@blakeli0 blakeli0 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These setups are 40 lines long and made the tests hard to read, can we extract the common set ups to a private method or setup method or Util class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, will do.

public class ITLongRunningOperation {

@Test
public void testGRPC_LROSuccessfulResponse_NoDeadlineExceeded()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test name still reflect what we are testing? Or maybe we need to make it clearer?
I see NoDeadlineExceeded which assumes not retry, but I see an assertion in the end that asserts attemptCount to be at least 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, I can update the test names to better clarity. Some tests cases are testing that the deadline exceeded and that it throws a DEADLINE_EXCEEDED exception and it's very easy to confuse.

// Explicitly set retries as disabled (maxAttempts == 1)
.setMaxAttempts(1)
.build();
EchoStubSettings.Builder grpcEchoSettingsBuilder = EchoStubSettings.newBuilder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for the setups in this class, it would be great if we can simplify them. Maybe we can expose a method in TestClientInitializer that takes RetrySettings and RetryableCodes as parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do! That's a good idea.

* <p>Each test attempts to get the number of attempts done in each call. The attemptCount is
* incremented by 1 as the first attempt is zero indexed.
*/
public class ITUnaryDeadline {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR. It would be great if we can programmatically start/stop the showcase server so we can test things like connect/socket timeouts.

Copy link
Contributor Author

@lqiu96 lqiu96 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can raise in the showcase repo and see what the others think (or see if that's possible / easily done).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more like doing it in our own repo, but yeah asking others if it has been done before is a good idea.

if (newDeadline != null) {
builder.setDeadline(newDeadline);
if (inputOptions.getTimeout() != null) {
Duration newTimeout = java.time.Duration.ofNanos(inputOptions.getTimeout().getNano());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use ms instead of nanos now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sonarcloud
Copy link

sonarcloud bot commented May 8, 2023

[gapic-generator-java-root] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

82.4% 82.4% Coverage
0.0% 0.0% Duplication

@sonarcloud
Copy link

sonarcloud bot commented May 8, 2023

[java_showcase_integration_tests] SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

72.9% 72.9% Coverage
0.0% 0.0% Duplication

@sonarcloud
Copy link

sonarcloud bot commented May 8, 2023

[java_showcase_unit_tests] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

83.5% 83.5% Coverage
0.0% 0.0% Duplication

@lqiu96 lqiu96 merged commit d2fe520 into main May 8, 2023
18 of 19 checks passed
@lqiu96 lqiu96 deleted the main-showcase_retries branch May 8, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpJson Unary Callables not following RetrySetting's TotalTimeout value
2 participants