New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce opportunistic option for TLS #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is that this code is supposed to be merged upstream and we may not actually need it by the time we will be able to actually use it.
Is my understanding correct that the main reason why this is an upstream PR is that we want to reuse ServerTlsHandler? We also re-implement netty.OptionalSslHandler as well as protocol negotiator, so our custom implementations for those classes don't strictly need to be upstreamed. I guess we should consider doing something similar for ServerTlsHandler, as upstreaming this code this might be the major blocker (I don't see , for example, how we can add experiments to CustomOptionalSslHandler in the current implementation)
Another thing, that is not related to PR itself, is that I don't understand how netty events are used by grpc, which to me is the major source of complexity. I am going to look more into upstream code to better understand it.
if (executor != null) { | ||
this.executor = executor; | ||
} else { | ||
this.executor = ImmediateExecutor.INSTANCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases do we need anything other than ImmediateExecutor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's typically a ThreadPoolExecutor
under the hood, in all my testing I haven't seen it fallback to an ImmediateExecutor
. Again this is to maintain the behavior of direct TLS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joybestourous do you know how/where the executor is used in the direct TLS flow? I would expect this sort of execution policy to live outside of the pipelines and, in most cases, the pipelines are oblivious to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used to create the SslHandler, and then used here in the SslHandler
* behaves the same as the traditional plaintext pipeline, and we need to call | ||
* {@link ChannelHandlerContext#fireUserEventTriggered(Object)} to continue along the pipeline | ||
*/ | ||
public class CustomOptionalSslHandler extends ByteToMessageDecoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a subclass of netty.OptionalSslHandler? It is public and if we need to use a custom executor (which I don't understand why) we can also overwrite protected newSslHandler
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is only not a subclass because of that executor constructor. If we decide we dont need that we can simplify this.
// Since the SslHandler was not inserted into the pipeline the ownership of the SSLEngine was not | ||
// transferred to the SslHandler. | ||
if (sslHandler != null) { | ||
ReferenceCountUtil.safeRelease(sslHandler.engine()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied from OptionalSslHandler, but I don't understand why we need refCounts here. Are we expected to call this every time we create a new engine and we are done with using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all SslEngines are ReferenceCounted meaning they need explicit deallocation. If there was an sslHandler that took ownership of the lifecycle here, it would handle release when handlerRemoved
is called, but because there is not we must do so here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this complexity comes from the fact netty leverages off-heap memory extensively. To manage that safely, netty uses explicit ref counting.
} else { | ||
context.pipeline().remove(this); | ||
context.pipeline().replace(ProtocolNegotiators.ServerOpportunisticTlsHandler.class, null, next); | ||
context.fireUserEventTriggered(ProtocolNegotiationEvent.DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we firing this even here and why we don't do the same in case when we handle ssl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we handle Ssl, this is called when the SslHandshake is completed. Without this, the pipeline gets stuck here in the same way I mentioned above; this asks the pipeline to move on.
context.pipeline().replace(this, newNonSslHandlerName(), handler); | ||
} else { | ||
context.pipeline().remove(this); | ||
context.pipeline().replace(ProtocolNegotiators.ServerOpportunisticTlsHandler.class, null, next); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what original OptionalSslHandler was doing, why do we need this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ProtocolNegotiators.ServerOpportunisticTlsHandler
is not a netty thing it's a gRPC thing, so this wouldn't belong in the original.
The idea here is we want the context.pipeline to match that of the plaintext case. In the tls case the pipeline looks like this:
GrpcNegotiationHandler -> ServerTlsHandler -> WaitForActiveHandler
In the plaintext case, the middle is skipped
GrpcNegotiationHandler -> WaitForActiveHandler
So this restores the latter pipeline. Without this line, we keep hitting the ServerOpportunisticTlsHandler
and thereby keep adding this handler when we don't need to, which results in us repeating this pathway until we get a io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
@Override | ||
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception { | ||
if (evt instanceof ProtocolNegotiationEvent) { | ||
pne = (ProtocolNegotiationEvent) evt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this typecast? I don't see how we are using pne object in this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java 🙃 This is considered an Object
without the typecast.
pne is used below in fireProtocolNegotiationEvent
Yes, this is definitely a concern, especially with grpc-java upgrades blocked due to grpc#11108 . That said, I'm not sure if there is a better way of doing this. I could've missed something though, do you have ideas?
Yes, this is correct, in the TLS case we want our implementation to match that of upstream, and we currently can't do that on our end (a lot of functionality is private, so we cant even copy-pasta it). Without that I think it makes it risky to move off of opp TLS to pure TLS. On top of this the
Experiments would not go here. I still need to consider how we'll do this on the client side, but this is specifically for the server, and would happen on our end at configuration time, not gRPC's at runtime. If oppTLS works correctly we really should never need a runtime update to turn it off, so I think that is acceptable. EDIT: I did take a look at the client side, which is implemented in pretty much the same way as the server side; if we want experiments on the client side, we need the pathway to be to make
Happy to chat about it if you'd like. |
For context here is the summary of your offline discussion:
|
Closing given comments on grpc#11117 |
No description provided.