Skip to content

Commit

Permalink
grpclb: should not ignore subchannels with CONNECTING state in aggreg…
Browse files Browse the repository at this point in the history
…ating the overall LB state (#7959)

We should treat both IDLE and CONNECTING subchannels as "connection in progress" when aggregating for the overall load balancing state. Otherwise, RPCs could fail prematurely if one subchannel enters TF while all the others are still in CONNECTING.

23d2796 made each individual subchannel stay in TF until READY if it previously was in TF. So subchannels with CONNECTING state are those in first time connecting. We should give them time to connect.
  • Loading branch information
voidzcy committed Mar 12, 2021
1 parent afe8831 commit 9c562c8
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 6 deletions.
8 changes: 4 additions & 4 deletions grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java
Expand Up @@ -776,7 +776,7 @@ private void maybeUpdatePicker() {
case ROUND_ROBIN:
pickList = new ArrayList<>(backendList.size());
Status error = null;
boolean hasIdle = false;
boolean hasPending = false;
for (BackendEntry entry : backendList) {
Subchannel subchannel = entry.subchannel;
Attributes attrs = subchannel.getAttributes();
Expand All @@ -785,12 +785,12 @@ private void maybeUpdatePicker() {
pickList.add(entry);
} else if (stateInfo.getState() == TRANSIENT_FAILURE) {
error = stateInfo.getStatus();
} else if (stateInfo.getState() == IDLE) {
hasIdle = true;
} else {
hasPending = true;
}
}
if (pickList.isEmpty()) {
if (error != null && !hasIdle) {
if (error != null && !hasPending) {
pickList.add(new ErrorEntry(error));
state = TRANSIENT_FAILURE;
} else {
Expand Down
Expand Up @@ -1175,9 +1175,10 @@ public void roundRobinMode_subchannelStayTransientFailureUntilReady() {
deliverSubchannelState(subchannel2, ConnectivityStateInfo.forNonError(CONNECTING));
inOrder.verify(helper).updateBalancingState(eq(CONNECTING), any(SubchannelPicker.class));

// Switch subchannel1 to TRANSIENT_FAILURE, making the general state TRANSIENT_FAILURE too.
Status error = Status.UNAVAILABLE.withDescription("error1");
// Switch all subchannels to TRANSIENT_FAILURE, making the general state TRANSIENT_FAILURE too.
Status error = Status.UNAVAILABLE.withDescription("error");
deliverSubchannelState(subchannel1, ConnectivityStateInfo.forTransientFailure(error));
deliverSubchannelState(subchannel2, ConnectivityStateInfo.forTransientFailure(error));
inOrder.verify(helper).updateBalancingState(eq(TRANSIENT_FAILURE), pickerCaptor.capture());
assertThat(((RoundRobinPicker) pickerCaptor.getValue()).pickList)
.containsExactly(new ErrorEntry(error));
Expand Down

0 comments on commit 9c562c8

Please sign in to comment.