Skip to content

Commit

Permalink
Fix bug of ignoring subchannels with CONNECTING state when aggregatin…
Browse files Browse the repository at this point in the history
…g the overall load balancing state.

We should treat both IDLE and CONNECTING subchannels as "connection in progress" when aggregating for the overall load balancing state. Otherwise, RPCs could fail prematurely if one subchannel enters TF while all the others are still in CONNECTING.

23d2796 made each individual subchannel stay in TF until READY if it previously was in TF. So subchannels with CONNECTING state are those in first time conncting. We should give time for them to connect.
  • Loading branch information
voidzcy committed Mar 12, 2021
1 parent 6a9c990 commit ede7e9a
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 6 deletions.
8 changes: 4 additions & 4 deletions grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java
Expand Up @@ -776,7 +776,7 @@ private void maybeUpdatePicker() {
case ROUND_ROBIN:
pickList = new ArrayList<>(backendList.size());
Status error = null;
boolean hasIdle = false;
boolean hasPending = false;
for (BackendEntry entry : backendList) {
Subchannel subchannel = entry.subchannel;
Attributes attrs = subchannel.getAttributes();
Expand All @@ -785,12 +785,12 @@ private void maybeUpdatePicker() {
pickList.add(entry);
} else if (stateInfo.getState() == TRANSIENT_FAILURE) {
error = stateInfo.getStatus();
} else if (stateInfo.getState() == IDLE) {
hasIdle = true;
} else {
hasPending = true;
}
}
if (pickList.isEmpty()) {
if (error != null && !hasIdle) {
if (error != null && !hasPending) {
pickList.add(new ErrorEntry(error));
state = TRANSIENT_FAILURE;
} else {
Expand Down
Expand Up @@ -1175,9 +1175,10 @@ public void roundRobinMode_subchannelStayTransientFailureUntilReady() {
deliverSubchannelState(subchannel2, ConnectivityStateInfo.forNonError(CONNECTING));
inOrder.verify(helper).updateBalancingState(eq(CONNECTING), any(SubchannelPicker.class));

// Switch subchannel1 to TRANSIENT_FAILURE, making the general state TRANSIENT_FAILURE too.
Status error = Status.UNAVAILABLE.withDescription("error1");
// Switch all subchannels to TRANSIENT_FAILURE, making the general state TRANSIENT_FAILURE too.
Status error = Status.UNAVAILABLE.withDescription("error");
deliverSubchannelState(subchannel1, ConnectivityStateInfo.forTransientFailure(error));
deliverSubchannelState(subchannel2, ConnectivityStateInfo.forTransientFailure(error));
inOrder.verify(helper).updateBalancingState(eq(TRANSIENT_FAILURE), pickerCaptor.capture());
assertThat(((RoundRobinPicker) pickerCaptor.getValue()).pickList)
.containsExactly(new ErrorEntry(error));
Expand Down

0 comments on commit ede7e9a

Please sign in to comment.