You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please see the FAQ in our main README.md, then answer the questions below before
submitting your issue.
What version of gRPC are you using?
1.33.2
What version of Go are you using (go version)?
1.14.2
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
Use a DNS server to enable grpclb by sending both SRV and TXT records. Run a grpclb server to respond to LB calls. Then using a client to connect to a localhost hello service.
What did you expect to see?
Call succeeds with no special dial/call options.
What did you see instead?
Call fails with "all SubConns are in TransientFailure". The only way to make it work is to either dial with grpc.WithBlock() or call with grpc.WaitForReady().
This is certainly a regression.
Looking at the code, function (lb *lbBalancer) aggregateSubConnStates() connectivity.State goes through the failure path. Tracing the call, it seems that initially the subchannel state is set to Idle and state is placed in the cache before calling Connect(). However, aggregateSubConnStates() seems to use the cached state, which is still Idle causing the channel to go in TransientFailure. I've confirmed the discrepancy using a debugger (cached state Idle, subchannel state Connecting).
When setting either grpc.WithBlock() or grpc.WaitForReady() it forces the call to wait.
The following logs see the incorrect sequence: Channel seems to go into TransientFailure before [grpclb] lbBalancer: handle SubConn state change: 0xc0001ba3b0, CONNECTING.
Logs:
INFO: 2020/11/11 14:46:33 [core] parsed scheme: "dns"
INFO: 2020/11/11 14:46:33 [core] ccResolverWrapper: sending update to cc: {[] 0xc0001ec7e0 0xc00070c5e0}
INFO: 2020/11/11 14:46:33 [core] ClientConn switching balancer to "grpclb"
INFO: 2020/11/11 14:46:33 [core] Channel switches to new LB policy "grpclb"
INFO: 2020/11/11 14:46:33 [grpclb] lbBalancer: UpdateClientConnState: {ResolverState:{Addresses:[] ServiceConfig:0xc0001ec7e0 Attributes:0xc00070c5e0} BalancerConfig:0xc0001ec660}
INFO: 2020/11/11 14:46:33 [core] parsed scheme: "grpclb-internal"
INFO: 2020/11/11 14:46:33 [core] ccResolverWrapper: sending update to cc: {[{127.0.0.1:1234 localhost. <nil> 0 <nil>}] <nil> <nil>}
INFO: 2020/11/11 14:46:33 [core] ClientConn switching balancer to "pick_first"
INFO: 2020/11/11 14:46:33 [core] Channel switches to new LB policy "pick_first"
INFO: 2020/11/11 14:46:33 [core] Subchannel Connectivity change to CONNECTING
INFO: 2020/11/11 14:46:33 [core] blockingPicker: the picked transport is not ready, loop back to repick
INFO: 2020/11/11 14:46:33 [core] pickfirstBalancer: UpdateSubConnState: 0xc000402450, {CONNECTING <nil>}
INFO: 2020/11/11 14:46:33 [core] Channel Connectivity change to CONNECTING
INFO: 2020/11/11 14:46:33 [core] Subchannel picks a new address "127.0.0.1:1234" to connect
INFO: 2020/11/11 14:46:33 [core] Subchannel Connectivity change to READY
INFO: 2020/11/11 14:46:33 [core] pickfirstBalancer: UpdateSubConnState: 0xc000402450, {READY <nil>}
INFO: 2020/11/11 14:46:33 [core] Channel Connectivity change to READY
INFO: 2020/11/11 14:46:33 [grpclb] lbBalancer: processing server list: servers:{ip_address:"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x01" port:12340 load_balance_token:"hereisatoken"}
INFO: 2020/11/11 14:46:33 [grpclb] lbBalancer: server list entry[0]: ipStr:|127.0.0.1|, port:|12340|, load balancer token:|token|
INFO: 2020/11/11 14:46:33 [core] Subchannel Connectivity change to CONNECTING
INFO: 2020/11/11 14:46:33 [core] Channel Connectivity change to TRANSIENT_FAILURE
INFO: 2020/11/11 14:46:33 [core] Subchannel picks a new address "127.0.0.1:12340" to connect
INFO: 2020/11/11 14:46:33 [grpclb] lbBalancer: handle SubConn state change: 0xc0001ba3b0, CONNECTING
INFO: 2020/11/11 14:46:33 [core] Subchannel Connectivity change to READY
INFO: 2020/11/11 14:46:33 [core] Channel Connectivity change to CONNECTING
INFO: 2020/11/11 14:46:33 [grpclb] lbBalancer: handle SubConn state change: 0xc0001ba3b0, READY
INFO: 2020/11/11 14:46:33 [core] Channel Connectivity change to READY
The text was updated successfully, but these errors were encountered:
Please see the FAQ in our main README.md, then answer the questions below before
submitting your issue.
What version of gRPC are you using?
1.33.2
What version of Go are you using (
go version
)?1.14.2
What operating system (Linux, Windows, …) and version?
Linux
What did you do?
Use a DNS server to enable grpclb by sending both SRV and TXT records. Run a grpclb server to respond to LB calls. Then using a client to connect to a localhost hello service.
What did you expect to see?
Call succeeds with no special dial/call options.
What did you see instead?
Call fails with "all SubConns are in TransientFailure". The only way to make it work is to either dial with
grpc.WithBlock()
or call withgrpc.WaitForReady()
.This is certainly a regression.
Looking at the code, function
(lb *lbBalancer) aggregateSubConnStates() connectivity.State
goes through the failure path. Tracing the call, it seems that initially the subchannel state is set toIdle
and state is placed in the cache before callingConnect()
. However,aggregateSubConnStates()
seems to use the cached state, which is stillIdle
causing the channel to go inTransientFailure
. I've confirmed the discrepancy using a debugger (cached stateIdle
, subchannel stateConnecting
).When setting either
grpc.WithBlock()
orgrpc.WaitForReady()
it forces the call to wait.The following logs see the incorrect sequence: Channel seems to go into TransientFailure before
[grpclb] lbBalancer: handle SubConn state change: 0xc0001ba3b0, CONNECTING
.Logs:
The text was updated successfully, but these errors were encountered: