-
Notifications
You must be signed in to change notification settings - Fork 4.5k
grpc: perform graceful switching of LB policies in the ClientConn
by default
#5285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// With the use of gracefulswitch.Balancer in ccBalancerWrapper, handling this | ||
// asynchronously is probably not required anymore since the switchTo() method | ||
// handles the balancer switch by pushing the update onto the channel. | ||
// TODO(easwars): Handle this inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worthwhile? This would be the only thing done inline, then? Seems better to keep everything going through the channel unless there's compelling reasons to skip it for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only call from the underlying balancer which goes through the channel instead of being handled inline. And this was being handled inline before we discovered a couple of deadlocks which forced us to release cc.mu
before calling ccb.close()
. But now, ccb.close()
will only be called from cc.Close()
and without holding cc.mu
. So, it should be safe to handle this inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to move all the outgoing calls to the goroutine, and do all the incoming calls inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already moved all outgoing calls to the goroutine. Do you want me to switch this to be done inline in this PR?
clientconn.go
Outdated
cc.curBalancerName = builder.Name() | ||
cc.balancerWrapper = newCCBalancerWrapper(cc, builder, cc.balancerBuildOpts) | ||
} | ||
|
||
func (cc *ClientConn) handleSubConnStateChange(sc balancer.SubConn, s connectivity.State, err error) { | ||
cc.mu.Lock() | ||
if cc.conns == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is unnecessary too? Add a TODO to remove if you agree, perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review.
// With the use of gracefulswitch.Balancer in ccBalancerWrapper, handling this | ||
// asynchronously is probably not required anymore since the switchTo() method | ||
// handles the balancer switch by pushing the update onto the channel. | ||
// TODO(easwars): Handle this inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only call from the underlying balancer which goes through the channel instead of being handled inline. And this was being handled inline before we discovered a couple of deadlocks which forced us to release cc.mu
before calling ccb.close()
. But now, ccb.close()
will only be called from cc.Close()
and without holding cc.mu
. So, it should be safe to handle this inline.
clientconn.go
Outdated
cc.curBalancerName = builder.Name() | ||
cc.balancerWrapper = newCCBalancerWrapper(cc, builder, cc.balancerBuildOpts) | ||
} | ||
|
||
func (cc *ClientConn) handleSubConnStateChange(sc balancer.SubConn, s connectivity.State, err error) { | ||
cc.mu.Lock() | ||
if cc.conns == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this.
I was able to run the test 1000 times with the race detector without failures. Ready to be looked at again. |
@@ -178,6 +178,10 @@ func (gsb *Balancer) ResolverError(err error) { | |||
} | |||
|
|||
// ExitIdle forwards the call to the latest balancer created. | |||
// | |||
// If the latest balancer does not support ExitIdle, the subConns need to be | |||
// re-connected manually. This used to be done in the ClientConn earlier, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would delete the sentence about how it used to be. Or at least it doesn't belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
balancer_conn_wrappers.go
Outdated
type exitIdle struct{} | ||
|
||
// ccBalancerWrapper is a wrapper on top of cc for balancers. | ||
// ccBalancerWrapper is a wrapper on top of cc for balancers. It ensures that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's actually both a cc wrapper and a balancer wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about now?
balancer_conn_wrappers.go
Outdated
updateCh *buffer.Unbounded // Updates written on this channel are processed by watcher(). | ||
resultCh *buffer.Unbounded // Results of calls to UpdateClientConnState() are pushed here. | ||
closed *grpcsync.Event // Indicates if close has been called. | ||
done *grpcsync.Event // Indicates if close has completed its work. | ||
|
||
mu sync.Mutex | ||
subConns map[*acBalancerWrapper]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this? Doesn't look like we use it for anything now.
And the gracefulswitch balancer also keeps maps of SubConns. Is that good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I was able to get rid of this.
balancer_conn_wrappers.go
Outdated
// back to grpc which propagates that to the resolver. | ||
func (ccb *ccBalancerWrapper) updateClientConnState(ccs *balancer.ClientConnState) error { | ||
ccb.updateCh.Put(watcherUpdate{typ: updateTypeClientConnState, update: ccs}) | ||
res := <-ccb.resultCh.Get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance this will block forever? When things are closing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added a select and made it exit if close is called.
balancer_conn_wrappers.go
Outdated
// and the selected LB policy is not "grpclb", these addresses will be filtered | ||
// out and ccs will be modified with the updated address list. | ||
func (ccb *ccBalancerWrapper) handleClientConnStateChange(ccs *balancer.ClientConnState) { | ||
ccb.balancerMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this hold balancerMu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it calls ccb.balancer.UpdateClientConnState
in the last line. Would you rather that I grab the lock around that statement?
// forward to the resolver. | ||
func (ccb *ccBalancerWrapper) updateClientConnState(ccs *balancer.ClientConnState) error { | ||
// handleSubConnStateChange handles a SubConnState update from the update | ||
// channel and invokes the appropriate method on the underlying balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to not push this to the channel?
And document that.
And if we do all the outgoing calls in the watcher goroutine, maybe we don't need the balancerMu?
There were two methods that need to happen inline: Close() and ClientConnStateChange() (because of the return error). But both are handled now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to not push this to the channel?
SubConn state change is pushed onto the channel in updateSubConnState()
. This is the method which is called when the watcher
reads that update from the channel.
And if we do all the outgoing calls in the watcher goroutine, maybe we don't need the balancerMu?
Woohoo !!
// With the use of gracefulswitch.Balancer in ccBalancerWrapper, handling this | ||
// asynchronously is probably not required anymore since the switchTo() method | ||
// handles the balancer switch by pushing the update onto the channel. | ||
// TODO(easwars): Handle this inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to move all the outgoing calls to the goroutine, and do all the incoming calls inline.
This PR changes the default behavior of
ClientConn
when it ends up switching the LB policy.Summary of code changes:
gracefulswitch.Balancer
inccBalancerWrapper
instead ofbalancer.Balancer
switchTo
method to the balancer wrapper which callsSwitchTo
on thegracefulswitch.Balancer
watcher
goroutinewatcher
goroutine is now merely a skeleton, with functionality moved to appropriate handlersClientConn
:ccBalancerWrapper
GRPCLB
is now handled by theccBalancerWrapper
Connect
on thesubConns
if the underlying balancer does not supportExitIdle
is moved to thegracefulswitch.Balancer
Summary of test changes:
test/balancer_switching_test.go
to exercise the graceful switching functionality e2estubBalancer
wrap a real balancer. Updated an existing test intest/resolver_update_test.go
which was using a custom balancer for the same to use thestubBalancer
instead.ParseConfig
functionality to thestubBalancer
clientconn_test.go
to fail within a reasonable amount of time. (This test was timing out initially when I was making my changes, so had to fix it).Fixes #5270
RELEASE NOTES:
ClientConn
by default