Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GrpcClientProviderTest is flaky #1853

Open
swankjesse opened this issue Apr 14, 2021 · 3 comments
Open

GrpcClientProviderTest is flaky #1853

swankjesse opened this issue Apr 14, 2021 · 3 comments
Labels

Comments

@swankjesse
Copy link
Collaborator

GrpcClientProviderTest > happyPath() FAILED
    java.io.IOException: gRPC transport failure (HTTP status=200, grpc-status=null, grpc-message=null)
        at com.squareup.wire.internal.GrpcKt.grpcResponseToException(grpc.kt:166)
        at com.squareup.wire.internal.RealGrpcCall.readExactlyOneAndClose(RealGrpcCall.kt:104)
        at com.squareup.wire.internal.RealGrpcCall.executeBlocking(RealGrpcCall.kt:77)
        at misk.client.GrpcClientProviderTest.happyPath(GrpcClientProviderTest.kt:80)

        Caused by:
        okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
            at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.kt:358)
            at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:276)
            at okio.RealBufferedSource.exhausted(RealBufferedSource.kt:197)
            at com.squareup.wire.internal.GrpcMessageSource.read(GrpcMessageSource.kt:37)
            at com.squareup.wire.internal.GrpcMessageSource.readExactlyOneAndClose(GrpcMessageSource.kt:66)
            at com.squareup.wire.internal.RealGrpcCall.readExactlyOneAndClose(RealGrpcCall.kt:102)
            ... 2 more
@carterhudson
Copy link

I'm encountering this in a real environment; any ideas what could be causing it? My current theory is that it's a keep-alive configuration issue.

java.io.IOException: gRPC transport failure (HTTP status=200, grpc-status=null, grpc-message=null)

@r3mariano
Copy link
Collaborator

r3mariano commented May 5, 2021

Some more logs from the CI build with extra okhttp logging added:

INFO: [0 ms] callStart: Request{method=POST, url=https://127.0.0.1:39961/test/GetDestinationWarehouse, headers=[te:trailers, grpc-trace-bin:, grpc-accept-encoding:gzip, grpc-encoding:gzip], tags={class com.squareup.wire.GrpcMethod=com.squareup.wire.GrpcMethod@20376e8d}}
INFO: [0 ms] proxySelectStart: https://127.0.0.1:39961/
INFO: [0 ms] proxySelectEnd: [DIRECT]
INFO: [0 ms] dnsStart: 127.0.0.1
INFO: [0 ms] dnsEnd: [/127.0.0.1]
INFO: [0 ms] connectStart: /127.0.0.1:39961 DIRECT
INFO: [1 ms] secureConnectStart
INFO: [17 ms] secureConnectEnd: Handshake{tlsVersion=TLS_1_2 cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 peerCertificates=[CN=misk-server, OU=Server, O=Misk, L=San Francisco, ST=CA, C=US] localCertificates=[]}
INFO: [23 ms] connectEnd: h2
INFO: [23 ms] connectionAcquired: Connection{127.0.0.1:39961, proxy=DIRECT hostAddress=/127.0.0.1:39961 cipherSuite=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 protocol=h2}
INFO: [24 ms] requestHeadersStart
INFO: [24 ms] requestHeadersEnd
INFO: [24 ms] requestBodyStart
INFO: [27 ms] requestBodyEnd: byteCount=30
INFO: [27 ms] responseHeadersStart
INFO: [27 ms] responseHeadersEnd: Response{protocol=h2, code=200, message=, url=https://127.0.0.1:39961/test/GetDestinationWarehouse}
INFO: [27 ms] responseBodyStart
INFO: [27 ms] responseFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
INFO: [30 ms] connectionReleased
INFO: [30 ms] callFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
INFO: [0 ms] callStart: Request{method=POST, url=http://127.0.0.1:43027/RobotLocator/SayHello, headers=[te:trailers, grpc-trace-bin:, grpc-accept-encoding:gzip, grpc-encoding:gzip], tags={class com.squareup.wire.GrpcMethod=com.squareup.wire.GrpcMethod@2421a0a5}}
INFO: [5 ms] proxySelectStart: http://127.0.0.1:43027/
INFO: [10 ms] proxySelectEnd: [DIRECT]
INFO: [11 ms] dnsStart: 127.0.0.1
INFO: [11 ms] dnsEnd: [/127.0.0.1]
INFO: [12 ms] connectStart: /127.0.0.1:43027 DIRECT
[2021-05-05 05:37:30] >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a 
[2021-05-05 05:37:30] >> 0x00000000     6 SETTINGS       
[2021-05-05 05:37:30] >> 0x00000000     4 WINDOW_UPDATE  
[2021-05-05 05:37:30] Q10035 scheduled after   0 µs: OkHttp 127.0.0.1 
[2021-05-05 05:37:30] Q10035 starting              : OkHttp 127.0.0.1 
INFO: [14 ms] connectEnd: h2_prior_knowledge
[2021-05-05 05:37:30] Q10029 scheduled after   0 µs: OkHttp ConnectionPool 
[2021-05-05 05:37:30] Q10029 starting              : OkHttp ConnectionPool 
[2021-05-05 05:37:30] Q10029 run again after 300 s : OkHttp ConnectionPool 
[2021-05-05 05:37:30] Q10029 finished run in 489 µs: OkHttp ConnectionPool 
INFO: [18 ms] connectionAcquired: Connection{127.0.0.1:43027, proxy=DIRECT hostAddress=/127.0.0.1:43027 cipherSuite=none protocol=h2_prior_knowledge}
INFO: [19 ms] requestHeadersStart
[2021-05-05 05:37:30] >> 0x00000003   122 HEADERS       END_HEADERS 
INFO: [20 ms] requestHeadersEnd
[2021-05-05 05:37:30] << 0x00000000    24 SETTINGS       
INFO: [21 ms] requestBodyStart
[2021-05-05 05:37:30] Q10032 scheduled after   0 µs: OkHttp 127.0.0.1 applyAndAckSettings 
[2021-05-05 05:37:30] << 0x00000000     4 WINDOW_UPDATE  
[2021-05-05 05:37:30] Q10032 starting              : OkHttp 127.0.0.1 applyAndAckSettings 
[2021-05-05 05:37:30] << 0x00000000     0 SETTINGS      ACK 
[2021-05-05 05:37:30] Q10034 scheduled after   0 µs: OkHttp 127.0.0.1 onSettings 
[2021-05-05 05:37:30] >> 0x00000000     0 SETTINGS      ACK 
[2021-05-05 05:37:30] Q10034 starting              : OkHttp 127.0.0.1 onSettings 
[2021-05-05 05:37:30] Q10034 finished run in  84 µs: OkHttp 127.0.0.1 onSettings 
[2021-05-05 05:37:30] Q10032 finished run in   5 ms: OkHttp 127.0.0.1 applyAndAckSettings 
[2021-05-05 05:37:30] >> 0x00000003    31 DATA           
[2021-05-05 05:37:30] << 0x00000003    77 HEADERS       END_HEADERS 
[2021-05-05 05:37:30] << 0x00000003    16 DATA           
[2021-05-05 05:37:30] << 0x00000003    12 HEADERS       END_STREAM|END_HEADERS 
[2021-05-05 05:37:30] << 0x00000003     4 RST_STREAM     
INFO: [41 ms] requestBodyEnd: byteCount=31
INFO: [41 ms] responseHeadersStart
INFO: [42 ms] responseHeadersEnd: Response{protocol=h2_prior_knowledge, code=200, message=, url=http://127.0.0.1:43027/RobotLocator/SayHello}
INFO: [46 ms] responseBodyStart
INFO: [51 ms] responseFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL
[2021-05-05 05:37:30] Q10029 canceled              : OkHttp ConnectionPool 
[2021-05-05 05:37:30] >> 0x00000000     8 GOAWAY         
[2021-05-05 05:37:30] Q10035 finished run in  44 ms: OkHttp 127.0.0.1 
INFO: [59 ms] connectionReleased
INFO: [59 ms] callFailed: okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL

I don't see anything useful here, unfortunately, but I'll put the logs in anyway.

@ewolak-sq
Copy link
Collaborator

I looked into this error a bit, and when the tests flake, it's because the server explicitly cancelled the stream. This is happening in Http2Stream#cancelStreamIfNecessary(). Sometimes it thinks it needs to cancel the stream, but when it does, the client gets an unexpected CANCEL and fails the test. The cancel happens because source.finished == false but source.closed == true. When the tests pass, we get source.finished == false && source.closed == false then source.finished == true && source.closed == true, neither of which triggers the CANCEL.

My current theory is that there's a race between FramingSource#close() and Http2Stream#receiveHeaders(). If I so much as add a non-suspending breakpoint to log some state at the top of FramingSource#close(), then it loses the race and everything is fine. In the failure case, #cancelStreamIfNecessary() executes before receiveHeaders(), so closed == true but finished == false and then we try to CANCEL.

This is super-subtle, and I'm getting tired, so I'll revisit later in the week.

Success:

GrpcConnectivityTest > serviceThrowsException() STARTED
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=false source.closed=false source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream$FramingSource.close(Http2Stream.kt:478)
source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream.receiveHeaders(Http2Stream.kt:295)
inFinished=true source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=true source.closed=true source=okhttp3.internal.http2.Http2Stream$FramingSource@67b5e90e
Disconnected from the target VM, address: 'localhost:55361', transport: 'socket'
GrpcConnectivityTest > serviceThrowsException() PASSED

Failure:

GrpcConnectivityTest > serviceThrowsException() STARTED
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=false source.finished=false source.closed=false source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Breakpoint reached at okhttp3.internal.http2.Http2Stream.receiveHeaders(Http2Stream.kt:295)
inFinished=true source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Breakpoint reached at okhttp3.internal.http2.Http2Stream.cancelStreamIfNecessary$okhttp(Http2Stream.kt:502)
cancel=true source.finished=false source.closed=true source=okhttp3.internal.http2.Http2Stream$FramingSource@690cfa45
Disconnected from the target VM, address: 'localhost:55713', transport: 'socket'
GrpcConnectivityTest > serviceThrowsException() FAILED

7f4249625960b7f10dc61aae4d1896c2e126f538 introduced an optimization that may be connected to this issue.

Stack trace:

close:240, Http2Stream (okhttp3.internal.http2)
cancelStreamIfNecessary$okhttp:506, Http2Stream (okhttp3.internal.http2)
close:488, Http2Stream$FramingSource (okhttp3.internal.http2)
close:34, ForwardingSource (okio)
close:309, Exchange$ResponseBodySource (okhttp3.internal.connection)
close:392, RealBufferedSource (okio)
closeQuietly:491, Util (okhttp3.internal)
close:192, ResponseBody (okhttp3)
serviceThrowsException:125, GrpcConnectivityTest (misk.grpc)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
invokeMethod:686, ReflectionUtils (org.junit.platform.commons.util)
proceed:60, MethodInvocation (org.junit.jupiter.engine.execution)
proceed:131, InvocationInterceptorChain$ValidatingInvocation (org.junit.jupiter.engine.execution)
intercept:149, TimeoutExtension (org.junit.jupiter.engine.extension)
interceptTestableMethod:140, TimeoutExtension (org.junit.jupiter.engine.extension)
interceptTestMethod:84, TimeoutExtension (org.junit.jupiter.engine.extension)
apply:-1, 134748208 (org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$157)
lambda$ofVoidMethod$0:115, ExecutableInvoker$ReflectiveInterceptorCall (org.junit.jupiter.engine.execution)
apply:-1, 302813371 (org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall$$Lambda$158)
lambda$invoke$0:105, ExecutableInvoker (org.junit.jupiter.engine.execution)
apply:-1, 1743561709 (org.junit.jupiter.engine.execution.ExecutableInvoker$$Lambda$298)
proceed:106, InvocationInterceptorChain$InterceptedInvocation (org.junit.jupiter.engine.execution)
proceed:64, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
chainAndInvoke:45, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
invoke:37, InvocationInterceptorChain (org.junit.jupiter.engine.execution)
invoke:104, ExecutableInvoker (org.junit.jupiter.engine.execution)
invoke:98, ExecutableInvoker (org.junit.jupiter.engine.execution)
lambda$invokeTestMethod$6:205, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:-1, 224454165 (org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor$$Lambda$444)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
invokeTestMethod:201, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:137, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
execute:71, TestMethodTestDescriptor (org.junit.jupiter.engine.descriptor)
lambda$executeRecursively$5:135, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
accept:-1, 756519367 (org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$234)
forEach:1540, ArrayList (java.util)
invokeAll:38, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$5:139, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
accept:-1, 756519367 (org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService$$Lambda$234)
forEach:1540, ArrayList (java.util)
invokeAll:38, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$5:139, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 1332801233 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$230)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$7:125, NodeTestTask (org.junit.platform.engine.support.hierarchical)
invoke:-1, 1146843363 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$229)
around:135, Node (org.junit.platform.engine.support.hierarchical)
lambda$executeRecursively$8:123, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:-1, 270036154 (org.junit.platform.engine.support.hierarchical.NodeTestTask$$Lambda$228)
execute:73, ThrowableCollector (org.junit.platform.engine.support.hierarchical)
executeRecursively:122, NodeTestTask (org.junit.platform.engine.support.hierarchical)
execute:80, NodeTestTask (org.junit.platform.engine.support.hierarchical)
submit:32, SameThreadHierarchicalTestExecutorService (org.junit.platform.engine.support.hierarchical)
execute:57, HierarchicalTestExecutor (org.junit.platform.engine.support.hierarchical)
execute:51, HierarchicalTestEngine (org.junit.platform.engine.support.hierarchical)
execute:108, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:88, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
lambda$execute$0:54, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
accept:-1, 430960721 (org.junit.platform.launcher.core.EngineExecutionOrchestrator$$Lambda$194)
withInterceptedStreams:67, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:52, EngineExecutionOrchestrator (org.junit.platform.launcher.core)
execute:96, DefaultLauncher (org.junit.platform.launcher.core)
execute:75, DefaultLauncher (org.junit.platform.launcher.core)
processAllTestClasses:99, JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor (org.gradle.api.internal.tasks.testing.junitplatform)
access$000:79, JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor (org.gradle.api.internal.tasks.testing.junitplatform)
stop:75, JUnitPlatformTestClassProcessor (org.gradle.api.internal.tasks.testing.junitplatform)
stop:61, SuiteTestClassProcessor (org.gradle.api.internal.tasks.testing)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
dispatch:36, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:24, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:33, ContextClassLoaderDispatch (org.gradle.internal.dispatch)
invoke:94, ProxyDispatchAdapter$DispatchingInvocationHandler (org.gradle.internal.dispatch)
stop:-1, $Proxy2 (com.sun.proxy)
stop:133, TestWorker (org.gradle.api.internal.tasks.testing.worker)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
dispatch:36, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:24, ReflectionDispatch (org.gradle.internal.dispatch)
dispatch:182, MessageHubBackedObjectConnection$DispatchWrapper (org.gradle.internal.remote.internal.hub)
dispatch:164, MessageHubBackedObjectConnection$DispatchWrapper (org.gradle.internal.remote.internal.hub)
run:414, MessageHub$Handler (org.gradle.internal.remote.internal.hub)
onExecute:64, ExecutorPolicy$CatchAndRecordFailures (org.gradle.internal.concurrent)
run:48, ManagedExecutorImpl$1 (org.gradle.internal.concurrent)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:56, ThreadFactoryImpl$ManagedThreadRunnable (org.gradle.internal.concurrent)
run:834, Thread (java.lang)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants