Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Ready event is missed in RetriableStream. Sublistener::onReady is called but masterListener.onReady() is not. #6817

Closed
bjxutj opened this issue Mar 11, 2020 · 7 comments · Fixed by #7090
Assignees
Labels
Milestone

Comments

@bjxutj
Copy link

bjxutj commented Mar 11, 2020

What version of gRPC-Java are you using?

I am using 1.26.0, and I have tried 1.27.2 and latest code 1.27.3 (compiled by myself).

What is your environment?

Windows 10, with openjdk 11.0.2

What did you expect to see?

I expect to see the onReady is called, then trigger masterListener.onReady().

What did you see instead?

onReady is called, but masterListener.onReady() is not called.

Steps to reproduce the bug

Sorry, it is hard to reproduce, but I can analyze the code to explain.
In drain(Substream) of RetriableStream, the new state was generated when all the entries were runWith the substream. but in the start(), a StartEntry was added, then the stream started after runWith. There is a possibility that the stream has sent the data successfully and call onReady to notify (another thread), but the new state which include the substream in the drainedSubstreams was not generated, then masterListener.onReady() was not called.

@bjxutj bjxutj added the bug label Mar 11, 2020
@sanjaypujare
Copy link
Contributor

@bjxutj since you are saying it's hard to reproduce I am assuming there is no immediate impact or not looking for an urgent fix. If you have any code snippet/example to share that will be good.

@sanjaypujare
Copy link
Contributor

@dapengzhang0 might have some comments...

@dapengzhang0
Copy link
Member

I can agree the current implementation (Sublistener.onReady()) does not handle onReady correctly in all cases, as the comment says

https://github.com/grpc/grpc-java/blob/v1.26.0/core/src/main/java/io/grpc/internal/RetriableStream.java#L934-L939

@bjxutj Is this causing failure for your application or any performance problem?

@bjxutj
Copy link
Author

bjxutj commented Mar 11, 2020

@dapengzhang0 Yes, in our test, some times the request was blocked because it didn't receive onReady event. I want to know if we have a solution to avoid this problem?

@bjxutj
Copy link
Author

bjxutj commented Mar 11, 2020

@dapengzhang0 I am confusing on Sublistener.onReady(), does it really need to check if (state.drainedSubstreams.contains(substream)) ? In which case, we should check this? Can we just pass through the onReady event to the master listener, they will handle if the stream is cancelled or closed, or retry ...

@ejona86 ejona86 added this to the 1.29 milestone Mar 24, 2020
@dapengzhang0 dapengzhang0 modified the milestones: 1.29, 1.30 Apr 7, 2020
@dapengzhang0 dapengzhang0 modified the milestones: 1.30, 1.31 May 20, 2020
@dapengzhang0
Copy link
Member

@dapengzhang0 I am confusing on Sublistener.onReady(), does it really need to check if (state.drainedSubstreams.contains(substream)) ? In which case, we should check this? Can we just pass through the onReady event to the master listener, they will handle if the stream is cancelled or closed, or retry ...

@bjxutj Are you just doing the normal retry (without hedging)? If that's the case, calling master listener is fare, but it will break hedging more. In the hedging case, we just know that onReady() has already broken concurrency. So we might be okay with that in short term. We just want to make sure that will fix your problem.

@bjxutj
Copy link
Author

bjxutj commented Jun 3, 2020 via email

dapengzhang0 added a commit that referenced this issue Jun 4, 2020
This fixes #6817 for the normal retry case, although it makes the hedging issue #7089 more broken, and there is still space of optimization for normal retry.
dapengzhang0 added a commit to dapengzhang0/grpc-java that referenced this issue Jun 10, 2020
This fixes grpc#6817 for the normal retry case, although it makes the hedging issue grpc#7089 more broken, and there is still space of optimization for normal retry.
dapengzhang0 added a commit to dapengzhang0/grpc-java that referenced this issue Jun 10, 2020
This fixes grpc#6817 for the normal retry case, although it makes the hedging issue grpc#7089 more broken, and there is still space of optimization for normal retry.
dapengzhang0 added a commit that referenced this issue Jun 10, 2020
This fixes #6817 for the normal retry case, although it makes the hedging issue #7089 more broken, and there is still space of optimization for normal retry.
dfawley pushed a commit to dfawley/grpc-java that referenced this issue Jan 15, 2021
This fixes grpc#6817 for the normal retry case, although it makes the hedging issue grpc#7089 more broken, and there is still space of optimization for normal retry.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants