Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throwing an exception from some StreamTracer methods hangs the RPC #7373

Open
groakley opened this issue Aug 27, 2020 · 8 comments
Open

Throwing an exception from some StreamTracer methods hangs the RPC #7373

groakley opened this issue Aug 27, 2020 · 8 comments
Labels
Milestone

Comments

@groakley
Copy link
Contributor

What version of gRPC-Java are you using?

Head

What is your environment?

Android

What did you expect to see?

If a client-implemented ClientStreamTracer method throws an exception, the RPC should fail.

What did you see instead?

The RPC never completes (blocks forever or the future never completes, depending on stub type).

Steps to reproduce the bug

I was able to reproduce this bug with both the InProcessChannel and Cronet transport implementations.

From grpc's GrpcServerRuleTest:


@Test
public void testTracer() throws Exception {
  TestServiceImpl testService = new TestServiceImpl();
  
  grpcServerRule1.getServiceRegistry().addService(testService);

  SimpleServiceGrpc.SimpleServiceBlockingStub stub =
      SimpleServiceGrpc.newBlockingStub(grpcServerRule1.getChannel());

  SimpleRequest request = SimpleRequest.getDefaultInstance();

  SimpleResponse resp =
      stub.withInterceptors(
              new ClientInterceptor() {
                @Override
                public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(
                    MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
                  return next.newCall(
                      method,
                      callOptions.withStreamTracerFactory(
                          new ClientStreamTracer.Factory() {
                            @Override
                            public ClientStreamTracer newClientStreamTracer(
                                ClientStreamTracer.StreamInfo info, Metadata headers) {
                              return new ClientStreamTracer() {
                                @Override
                                public void streamClosed(Status status) {
                                  throw new RuntimeException();
                                }
                              };
                            }
                          }));
                }
              })
          .unaryRpc(request);
  assertThat(resp).isEqualTo(SimpleResponse.getDefaultInstance());
}
@ericgribkoff ericgribkoff added this to the Next milestone Aug 28, 2020
@ericgribkoff
Copy link
Contributor

Spoke with @groakley and he mentioned that @wanyingd1996 might be interested in taking this on. But apparently I can't directly assign them... (previous contributor in #7242)

@wanyingd1996
Copy link
Contributor

wanyingd1996 commented Sep 3, 2020

I reproduced the bug using InProcessChannel, and the trace shows the problem is inside of notifyClientClose method in InProcessTransport.java. The streamClosed method is called by clientStream.statsTraceCtx.streamClosed(clientStatus);, and if this fails, notifyClientClose will fail, then client will never get notified to be closed. Same problem can be reproduced by throwing RuntTimeException from any methods that is called by notifyClientClose. Unless there is other way to notify client, It seems the only way to avoid hanging RPC in this situation is to catch and do not throw in these StreamTracer related methods.

@groakley
Copy link
Contributor Author

groakley commented Sep 3, 2020

At a minimum, we'd want to make sure that any client-implemented method that throws doesn't hang the stub or prevent resources from being released.

Looking at InProcessTransport, one way to achieve that might be to modify

private void notifyClientClose(Status status, Metadata trailers) {
  Status clientStatus = cleanStatus(status, includeCauseWithStatus);
  synchronized (this) {
    if (closed) {
      return;
    }
    if (clientReceiveQueue.isEmpty()) {
      closed = true;
      clientStream.statsTraceCtx.clientInboundTrailers(trailers);
      clientStream.statsTraceCtx.streamClosed(clientStatus);
      clientStreamListener.closed(clientStatus, trailers);
    } else {
      clientNotifyStatus = clientStatus;
      clientNotifyTrailers = trailers;
    }
  }
  streamClosed();
}

to be

private void notifyClientClose(Status status, Metadata trailers) {
  Status clientStatus = cleanStatus(status, includeCauseWithStatus);
  synchronized (this) {
    if (closed) {
      return;
    }
    if (clientReceiveQueue.isEmpty()) {
      closed = true;
      clientStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStream.statsTraceCtx.clientInboundTrailers(trailers);});
      clientStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStream.statsTraceCtx.streamClosed(clientStatus);});
      clientNotifyStatus = runUpdatingStatusOnFailure(clientStatus, () -> {clientStreamListener.closed(clientStatus, trailers);});
    } else {
      clientNotifyStatus = clientStatus;
      clientNotifyTrailers = trailers;
    }
  }

  streamClosed();
}

private Status runUpdatingStatusOnFailure(Status status, Runnable command) {
  try {
    command.run();
  } catch (Throwable t) {
    return cleanStatus(Status.fromThrowable(t), includeCauseWithStatus);
  }
  return status;
}

@wanyingd1996
Copy link
Contributor

@ericgribkoff , do you think we should go with the fix suggested by @groakley ?

@ericgribkoff
Copy link
Contributor

At a minimum, we'd want to make sure that any client-implemented method that throws doesn't hang the stub or prevent resources from being released.

Yes. We also will want to update the javadoc for StreamTracer to say that implementations must not throw.

Regarding the runUpdatingStatusOnFailure approach, that would fix the hanging but we can/should actually be more stringent here about dealing with an exception if one is thrown: we should at the least shutdown the transport if any StreamTracer throws (moving the entire channel into a panic state might be even better, but might be tricky to actually accomplish).

@wanyingd1996 Wrapping the calls to statsTraceCtx methods in try/catch blocks should work (I think bad things can happen if an exception is thrown by a stream tracer even outside of the notifyClientClose call?). There are 18 calls to statsTraceCtx methods in InProcessTransport, plus some more in AbstractClientStream and a few in Netty/OkHttp specific code, so that approach might be somewhat unwieldy...catching any exception somewhere higher up in the call stack (whether it comes from a stream tracer or a bug in grpc code itself) and responding by shutting down the transport or panic'ing the channel with a clear error message could be a cleaner approach to this, but I haven't looked into details to see where/how this could best be done.

@wanyingd1996
Copy link
Contributor

wanyingd1996 commented Sep 9, 2020

@ericgribkoff I think catching the exception in StatsTraceContext instead of in notifyClientClose would be clear, but it seems I can't shutdown the transport in StatsTraceContext. I am fairly new to grpc, do you have any suggestions?

@larry-safran
Copy link
Contributor

@wanyingd1996 Are you still interested in working on this issue?

@wanyingd1996
Copy link
Contributor

@wanyingd1996 Are you still interested in working on this issue?

Hi, Larry, I'm not working on grpc anymore, and I don't have the bandwidth for the bug now. feel free to take this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants