Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the flakiness of the OkHttp tests (though not eliminate). #10973

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

larry-safran
Copy link
Contributor

@larry-safran larry-safran commented Feb 29, 2024

Doing server.shutdown() after the TlsTest tests, because OkHttpServer.acceptConnections is looping creating transports which TSAN gets unhappy when they conflict with cleanup. Doing a shutdown at the end of each test, so that loop can break before the rule's cleanup, pretty much eliminated those incidents.

WARNING: ThreadSanitizer: data race (pid=3197)
  Read of size 1 at 0x00008dff1726 by thread T71:
    #0 sun.security.ssl.SSLSocketImpl$AppOutputStream.write([BII)V SSLSocketImpl.java:1300 
    #1 okio.OutputStreamSink.write(Lokio/Buffer;J)V JvmOkio.kt:56 
    #2 okio.AsyncTimeout$sink$1.write(Lokio/Buffer;J)V AsyncTimeout.kt:102 
    #3 io.grpc.okhttp.AsyncSink$2.doRun()V AsyncSink.java:177 
    #4 io.grpc.okhttp.AsyncSink$WriteRunnable.run()V AsyncSink.java:232 
    #5 io.grpc.internal.SerializingExecutor.run()V SerializingExecutor.java:133 

  Previous write of size 1 at 0x00008dff1726 by thread T74:
    #0 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/String;ZLjava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:433 
    #1 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/String;Ljava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:303 
    #2 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:298 
    #3 sun.security.ssl.SSLSocketImpl.handleException(Ljava/lang/Exception;)V SSLSocketImpl.java:1686 
    #4 sun.security.ssl.SSLSocketImpl$AppInputStream.read([BII)I SSLSocketImpl.java:1086 
    #5 okio.InputStreamSource.read(Lokio/Buffer;J)J JvmOkio.kt:93 
    #6 okio.AsyncTimeout$source$1.read(Lokio/Buffer;J)J AsyncTimeout.kt:128 
    #7 okio.RealBufferedSource.request(J)Z RealBufferedSource.kt:209 
    #8 okio.RealBufferedSource.require(J)V RealBufferedSource.kt:202 
    #9 okio.RealBufferedSource.readByteString(J)Lokio/ByteString; RealBufferedSource.kt:218 
    #10 io.grpc.okhttp.internal.framed.Http2$Reader.readConnectionPreface()V Http2.java:111 
    #11 io.grpc.okhttp.OkHttpServerTransport$FrameHandler.run()V OkHttpServerTransport.java:569 
  Write of size 1 at 0x0000cdc2a7d9 by thread T94:
    #0 java.net.Socket.shutdownOutput()V Socket.java:1575 
    #1 sun.security.ssl.BaseSSLSocketImpl.shutdownOutput()V BaseSSLSocketImpl.java:234 
    #2 sun.security.ssl.SSLSocketImpl.handleClosedNotifyAlert(Z)V SSLSocketImpl.java:762 
    #3 sun.security.ssl.SSLSocketImpl.closeNotify(Z)V SSLSocketImpl.java:738 
    #4 sun.security.ssl.TransportContext.closeNotify(Z)V TransportContext.java:269 
    #5 sun.security.ssl.TransportContext.passiveInboundClose()V TransportContext.java:544 
    #6 sun.security.ssl.TransportContext.closeInbound()V TransportContext.java:506 
    #7 sun.security.ssl.Alert$AlertConsumer.consume(Lsun/security/ssl/ConnectionContext;Ljava/nio/ByteBuffer;)V Alert.java:245 
    #8 sun.security.ssl.TransportContext.dispatch(Lsun/security/ssl/Plaintext;)V TransportContext.java:186 
    #9 sun.security.ssl.SSLTransport.decode(Lsun/security/ssl/TransportContext;[Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Lsun/security/ssl/Plaintext; SSLTransport.java:172 
    #10 sun.security.ssl.SSLSocketImpl.decode(Ljava/nio/ByteBuffer;)Lsun/security/ssl/Plaintext; SSLSocketImpl.java:1514 
    #11 sun.security.ssl.SSLSocketImpl.readApplicationRecord(Ljava/nio/ByteBuffer;)Ljava/nio/ByteBuffer; SSLSocketImpl.java:1481 
    #12 sun.security.ssl.SSLSocketImpl$AppInputStream.read([BII)I SSLSocketImpl.java:1070 
    #13 okio.InputStreamSource.read(Lokio/Buffer;J)J JvmOkio.kt:93 
    #14 okio.AsyncTimeout$source$1.read(Lokio/Buffer;J)J AsyncTimeout.kt:128 
    #15 okio.RealBufferedSource.request(J)Z RealBufferedSource.kt:209 
    #16 okio.RealBufferedSource.require(J)V RealBufferedSource.kt:202 
    #17 io.grpc.okhttp.internal.framed.Http2$Reader.nextFrame(Lio/grpc/okhttp/internal/framed/FrameReader$Handler;)Z Http2.java:120 
    #18 io.grpc.okhttp.OkHttpServerTransport$FrameHandler.run()V OkHttpServerTransport.java:583 

  Previous read of size 1 at 0x0000cdc2a7d9 by thread T92:
    #0 java.net.Socket.isOutputShutdown()Z Socket.java:1659 
    #1 sun.security.ssl.BaseSSLSocketImpl.isOutputShutdown()Z BaseSSLSocketImpl.java:260 
    #2 sun.security.ssl.SSLSocketImpl.isOutputShutdown()Z SSLSocketImpl.java:886 
    #3 sun.security.ssl.SSLSocketImpl.close()V SSLSocketImpl.java:584 
    #4 sun.security.ssl.SSLSocketImpl$AppOutputStream.close()V SSLSocketImpl.java:1334 
    #5 okio.OutputStreamSink.close()V JvmOkio.kt:71 
    #6 okio.AsyncTimeout$sink$1.close()V AsyncTimeout.kt:112 
    #7 io.grpc.okhttp.AsyncSink$3.run()V AsyncSink.java:209 

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are making some of these changes because of TSAN failures, it is very helpful to include the failure report in the commit message in case we ever second-guess ourselves.

synchronized (lock) {
socket.setTcpNoDelay(true);
}
HandshakerSocketFactory.HandshakeResult result =
config.handshakerSocketFactory.handshake(socket, Attributes.EMPTY);
synchronized (handShakeLock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does nothing. This method is only called once ever per handShakeLock instance, and the lock isn't used anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Write of size 4 at 0x00008dc081dc by thread T73 (mutexes: write M0):
    #0 sun.security.ssl.SSLSessionImpl.invalidate()V SSLSessionImpl.java:781 
    #1 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/String;ZLjava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:386 
    #2 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/String;Ljava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:303 
    #3 sun.security.ssl.TransportContext.fatal(Lsun/security/ssl/Alert;Ljava/lang/Throwable;)Ljavax/net/ssl/SSLException; TransportContext.java:298 
    #4 sun.security.ssl.SSLSocketImpl.handleException(Ljava/lang/Exception;)V SSLSocketImpl.java:1686 
    #5 sun.security.ssl.SSLSocketImpl.startHandshake(Z)V SSLSocketImpl.java:466 
    #6 sun.security.ssl.SSLSocketImpl.startHandshake()V SSLSocketImpl.java:427 
    #7 io.grpc.okhttp.OkHttpProtocolNegotiator.negotiate(Ljavax/net/ssl/SSLSocket;Ljava/lang/String;Ljava/util/List;)Ljava/lang/String; OkHttpProtocolNegotiator.java:100 
    #8 io.grpc.okhttp.TlsServerHandshakerSocketFactory.handshake(Ljava/net/Socket;Lio/grpc/Attributes;)Lio/grpc/okhttp/HandshakerSocketFactory$HandshakeResult; TlsServerHandshakerSocketFactory.java:58 
    #9 io.grpc.okhttp.OkHttpServerTransport.startIo(Lio/grpc/internal/SerializingExecutor;)V OkHttpServerTransport.java:171 
    #10 io.grpc.okhttp.OkHttpServerTransport.lambda$start$0(Lio/grpc/internal/SerializingExecutor;)V OkHttpServerTransport.java:159 
    #11 io.grpc.okhttp.OkHttpServerTransport$$Lambda$90.run()V ?? 

  Previous read of size 4 at 0x00008dc081dc by thread T76:
    #0 sun.security.ssl.SSLSessionImpl.setContext(Lsun/security/ssl/SSLSessionContextImpl;)V SSLSessionImpl.java:768 
    #1 sun.security.ssl.SSLSessionContextImpl.put(Lsun/security/ssl/SSLSessionImpl;)V SSLSessionContextImpl.java:196 
    #2 sun.security.ssl.Finished$T12FinishedConsumer.onConsumeFinished(Lsun/security/ssl/ServerHandshakeContext;Ljava/nio/ByteBuffer;)V Finished.java:606 
    #3 sun.security.ssl.Finished$T12FinishedConsumer.consume(Lsun/security/ssl/ConnectionContext;Ljava/nio/ByteBuffer;)V Finished.java:527 
    #4 sun.security.ssl.SSLHandshake.consume(Lsun/security/ssl/ConnectionContext;Ljava/nio/ByteBuffer;)V SSLHandshake.java:392 
    #5 sun.security.ssl.HandshakeContext.dispatch(BLjava/nio/ByteBuffer;)V HandshakeContext.java:443 
    #6 sun.security.ssl.HandshakeContext.dispatch(BLsun/security/ssl/Plaintext;)V HandshakeContext.java:421 
    #7 sun.security.ssl.TransportContext.dispatch(Lsun/security/ssl/Plaintext;)V TransportContext.java:183 
    #8 sun.security.ssl.SSLTransport.decode(Lsun/security/ssl/TransportContext;[Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Lsun/security/ssl/Plaintext; SSLTransport.java:172 
    #9 sun.security.ssl.SSLSocketImpl.decode(Ljava/nio/ByteBuffer;)Lsun/security/ssl/Plaintext; SSLSocketImpl.java:1511 
    #10 sun.security.ssl.SSLSocketImpl.readHandshakeRecord()I SSLSocketImpl.java:1421 
    #11 sun.security.ssl.SSLSocketImpl.startHandshake(Z)V SSLSocketImpl.java:456 
    #12 sun.security.ssl.SSLSocketImpl.startHandshake()V SSLSocketImpl.java:427 
    #13 io.grpc.okhttp.OkHttpProtocolNegotiator.negotiate(Ljavax/net/ssl/SSLSocket;Ljava/lang/String;Ljava/util/List;)Ljava/lang/String; OkHttpProtocolNegotiator.java:100 
    #14 io.grpc.okhttp.TlsServerHandshakerSocketFactory.handshake(Ljava/net/Socket;Lio/grpc/Attributes;)Lio/grpc/okhttp/HandshakerSocketFactory$HandshakeResult; TlsServerHandshakerSocketFactory.java:58 
    #15 io.grpc.okhttp.OkHttpServerTransport.startIo(Lio/grpc/internal/SerializingExecutor;)V OkHttpServerTransport.java:171 
    #16 io.grpc.okhttp.OkHttpServerTransport.lambda$start$0(Lio/grpc/internal/SerializingExecutor;)V OkHttpServerTransport.java:159 
    #17 io.grpc.okhttp.OkHttpServerTransport$$Lambda$90.run()V ?? 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a JDK bug. I don't know what JDK was used when TSAN failed, but based on the line numbers it looks to be JDK 11 or earlier. And it adds the new session to the session cache before setting the context, which does seem racy.
https://github.com/openjdk/jdk/blob/da75f3c4ad5bdf25167a3ed80e51f567ab3dbd01/src/java.base/share/classes/sun/security/ssl/SSLSessionContextImpl.java#L175

JDK master does the same, without any extra synchronization. So I think we need to report this and get it fixed.

@@ -117,6 +117,7 @@ public void mtls_succeeds() throws Exception {
ManagedChannel channel = grpcCleanupRule.register(clientChannel(server, channelCreds));

SimpleServiceGrpc.newBlockingStub(channel).unaryRpc(SimpleRequest.getDefaultInstance());
server.shutdown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these do? The Cleanup rule will shut down the server just after this line. Do we have to shut down the server before the client?

It isn't like OkHttpServer.acceptConnecitons() is spinning tightly in its loop. accept() blocks until there is a connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But because the client shuts down before the server, after it falls out of the accept it goes through the logic that is skipped when the shutdown flag is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and I saw no change in failure rate. (1000 runs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn reality not matching the theory .... :(

@@ -293,7 +294,7 @@ public void close() throws SecurityException {
Buffer buffer = createMessageFrame(message);
frameHandler().data(false, 3, buffer, (int) buffer.size(),
(int) buffer.size());
assertThat(logs).hasSize(1);
assertWithMessage("log messages: " + logs).that(logs).hasSize(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this help? I tried it out and it seems to change:

io.grpc.okhttp.OkHttpClientTransportTest > testClientHandlerFrameLogger FAILED
    value of    : iterable.size()
    expected    : 2
    but was     : 1
    iterable was: [java.util.logging.LogRecord@7f12d356]

into:

io.grpc.okhttp.OkHttpClientTransportTest > testClientHandlerFrameLogger FAILED
    log messages: [java.util.logging.LogRecord@45026a28]
    value of    : iterable.size()
    expected    : 2
    but was     : 1
    iterable was: [java.util.logging.LogRecord@45026a28]

The point of Truth is that it prints the value of what your testing. Why print it twice?

synchronized (lock) {
this.socket = result.socket;
this.attributes = result.attributes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did TSAN show this was needed? This field shouldn't be looked at until listener.transportReady(). I'd be interested to see what TSAN found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants