Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix client call close race condition message leak #7106

Closed
wants to merge 4 commits into from

Conversation

njhill
Copy link
Contributor

@njhill njhill commented Jun 9, 2020

As reported in #7105. Not sure if this is how you want it done, but it does fix the problem.

Fixes #7105
Fixes #3557

As reported in grpc#7105. Not sure if this is how you want it done, but it does fix the problem.

Fixes grpc#7105
Fixes grpc#3557
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 9, 2020

CLA Check
The committers are authorized under a signed CLA.

@njhill
Copy link
Contributor Author

njhill commented Jun 10, 2020

I guess another option instead of throwing/handling RejectedExecutionException might be to just submit to the channel's executor instead (exactly where the REE is thrown in the changes as they are now), though it doesn't look like there's currently a way to access it.

That should be safe since the runnable is always a SerializingExecutor here.

@voidzcy voidzcy self-assigned this Jun 11, 2020
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assume RejectedExecutionException are due to the close race. We keep seeing users prematurely closing executors.

In #636 (comment) I had the idea of a "fallback" executor, which would be great in general. I'm not wild about using it for this case, since I do generally think we should be avoiding scheduling work on the executor after the call is complete, but I guess technically we could. I'd much rather fix the ordering problem.

@njhill
Copy link
Contributor Author

njhill commented Jun 15, 2020

Thanks @ejona86, I see that the more general part of this is a bit tricky. I wonder if a fallback executor could be avoided. There are two realistic cases I think - temporary thread/queue exhaustion and (permanent) premature executor shutdown.

The latter case constitutes an app bug where all bets are off since the channel is then "broken" at least for async usage. It seems reasonable in this case to just log the error and not run any call close callbacks.

To accommodate the former case, you could schedule a repeating task to re-attempt scheduling of the SerializingExecutor onto the user's executor at some interval (like 2sec), after cancelling the call and ensuring to run any non-callback cleanup logic inline (including releasing messages after an onMessage is rejected). This has advantages of not requiring a new executor, and of callbacks not being made from an unexpected executor which could have other implications for the app.

Irrespective of that, would a subset of this PR still be worthwhile (specifically the changes to ThreadlessExecutor) since they address #3557 and do so in a non-racy way? Of course the original bug would need to be either fixed or reverted first!

@ejona86
Copy link
Member

ejona86 commented Dec 10, 2020

@njhill, since #7105 was resolved by reverting the problematic commit, what should we do with this PR? I do think the ThreadlessExecutor changes look good, and given my comment on #3557 it looks like that is safe to do now. Are there other things we want to save?

@njhill
Copy link
Contributor Author

njhill commented Jan 1, 2021

@ejona86 shall I open another PR with just the ThreadlessExecutor changes?

@ejona86
Copy link
Member

ejona86 commented Jan 5, 2021

@njhill, yeah, that sounds good.

njhill added a commit to njhill/grpc-java that referenced this pull request Jan 11, 2021
@njhill
Copy link
Contributor Author

njhill commented Jan 11, 2021

@ejona86 PTAL #7798

@njhill njhill closed this Jan 11, 2021
@njhill njhill deleted the threadless branch January 11, 2021 21:41
njhill added a commit to njhill/grpc-java that referenced this pull request Jan 11, 2021
ejona86 pushed a commit that referenced this pull request Feb 23, 2021
…of RPC

Changes originally proposed as part of #7106.

Fixes #3557
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants