Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpclb: recover after receiving an empty server list #4879

Merged
merged 3 commits into from Oct 20, 2021

Conversation

menghanl
Copy link
Contributor

When grpclb receives an empty address list, it calls sc.UpdateAddress([]), which closes the underlying addrConn. It won't recover if a new addr list is received later.

RELEASE NOTES:

  • grpclb: recover if addresses are received after an empty server list was received previously

@menghanl
Copy link
Contributor Author

The root cause is SubConn not handling empty addr list correctly, filed #4881

}}

creds := serverNameCheckCreds{}
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a defaultTestTimeout const in this file. You can use that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1325 to 1330
for i := 0; i < 100 && !gotUnavailable; i++ {
func() {
ctx, cancel := context.WithTimeout(ctx, time.Second*1)
defer cancel()
_, err := testC.EmptyCall(ctx, &testpb.Empty{})
if status.Code(err) == codes.Unavailable {
gotUnavailable = true
}
}()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check clientConn state instead of this? Are we sure this will not be flaky?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from Alex's reproduction code.
I don't think this will be flaky. But I simplified the test a bit.

break
}
if sc != nil {
if len(backendAddrs) == 0 {
lb.cc.cc.RemoveSubConn(sc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to handle this UpdateAddresses() so that every balancer need not handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's what #4881 is tracking.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor nits.

creds := serverNameCheckCreds{}
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
cc, err := grpc.DialContext(ctx, r.Scheme()+":///"+beServerName, grpc.WithResolvers(r),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: Could we not split these lines in a random way. In a recent PR, I cleaned up this test to instead do

	cc, err := grpc.Dial(r.Scheme()+":///"+beServerName,
		grpc.WithResolvers(r),
		grpc.WithTransportCredentials(&serverNameCheckCreds{}),
		grpc.WithContextDialer(fakeNameDialer))
	if err != nil {
		t.Fatalf("Failed to dial to the backend %v", err)
	}

which is a tad more readable I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 1310 to 1317
r.UpdateState(resolver.State{
Addresses: []resolver.Address{{
Addr: tss.lbAddr,
Type: resolver.GRPCLB,
ServerName: lbServerName,
}},
ServiceConfig: scpr,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: similarly I cleaned up calls to UpdateState to include balancer addresses through an attribute instead of the deprecated Type field.

	rs = grpclbstate.Set(resolver.State{ServiceConfig: r.CC.ParseServiceConfig(grpclbConfig)},
		&grpclbstate.State{BalancerAddresses: []resolver.Address{{
			Addr:       tss.lbAddr,
			ServerName: lbServerName,
		}}})
	r.UpdateState(rs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@menghanl menghanl merged commit bd0f881 into grpc:master Oct 20, 2021
@menghanl menghanl deleted the grpclb_empty_address branch October 20, 2021 17:52
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants