Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix race between connection error and subconn shutdown #6494

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Aug 1, 2023

This is something I noticed when adding the SubConn.Shutdown method in #6493, but never observed. If we fail to check ac.ctx while holding ac.mu, then the following race is technically possible:

  1. In resetTransport, tryAllAddrs is called and returns an error. acCtx (AKA ac.ctx) .Err() == nil as the addrConn is still in use.
  2. From the LB policy, RemoveSubConn (or SubConn.Shutdown) is called. This sets the state to Shutdown and cancels ac.ctx.
  3. Back in resetTransport, we grab ac.mu and update the state to TransientFailure.

The net result is the LB policy's state listener (or UpdateSubConnState) will receive Shutdown followed by TransientFailure. All LB policies should be ignoring updates after Shutdown anyway, but this is still an incorrect transition.

I don't believe this case is worth trying to stimulate with a test. The race window is extremely tiny, and such bugs could have easily existed in several other places in our code. I confirmed via manual inspection that everything either checks ac.ctx or ac.transport while holding ac.mu before calling ac.updateConnectivityState.

RELEASE NOTES: none

@dfawley dfawley added this to the 1.58 Release milestone Aug 1, 2023
@dfawley dfawley requested a review from easwars August 1, 2023 21:14
@easwars easwars assigned dfawley and unassigned easwars Aug 3, 2023
@dfawley dfawley merged commit b9356e3 into grpc:master Aug 3, 2023
11 checks passed
@dfawley dfawley deleted the updateStateRace branch August 3, 2023 18:04
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants