Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce opportunistic option for TLS #2

Closed
wants to merge 7 commits into from

Conversation

joybestourous
Copy link
Owner

No description provided.

@joybestourous joybestourous changed the title testing introduce opportunistic option for TLS Apr 22, 2024
Copy link

@s-matyukevich s-matyukevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that this code is supposed to be merged upstream and we may not actually need it by the time we will be able to actually use it.

Is my understanding correct that the main reason why this is an upstream PR is that we want to reuse ServerTlsHandler? We also re-implement netty.OptionalSslHandler as well as protocol negotiator, so our custom implementations for those classes don't strictly need to be upstreamed. I guess we should consider doing something similar for ServerTlsHandler, as upstreaming this code this might be the major blocker (I don't see , for example, how we can add experiments to CustomOptionalSslHandler in the current implementation)

Another thing, that is not related to PR itself, is that I don't understand how netty events are used by grpc, which to me is the major source of complexity. I am going to look more into upstream code to better understand it.

if (executor != null) {
this.executor = executor;
} else {
this.executor = ImmediateExecutor.INSTANCE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases do we need anything other than ImmediateExecutor?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's typically a ThreadPoolExecutor under the hood, in all my testing I haven't seen it fallback to an ImmediateExecutor. Again this is to maintain the behavior of direct TLS

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joybestourous do you know how/where the executor is used in the direct TLS flow? I would expect this sort of execution policy to live outside of the pipelines and, in most cases, the pipelines are oblivious to it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to create the SslHandler, and then used here in the SslHandler

* behaves the same as the traditional plaintext pipeline, and we need to call
* {@link ChannelHandlerContext#fireUserEventTriggered(Object)} to continue along the pipeline
*/
public class CustomOptionalSslHandler extends ByteToMessageDecoder {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a subclass of netty.OptionalSslHandler? It is public and if we need to use a custom executor (which I don't understand why) we can also overwrite protected newSslHandler method.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is only not a subclass because of that executor constructor. If we decide we dont need that we can simplify this.

// Since the SslHandler was not inserted into the pipeline the ownership of the SSLEngine was not
// transferred to the SslHandler.
if (sslHandler != null) {
ReferenceCountUtil.safeRelease(sslHandler.engine());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied from OptionalSslHandler, but I don't understand why we need refCounts here. Are we expected to call this every time we create a new engine and we are done with using it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all SslEngines are ReferenceCounted meaning they need explicit deallocation. If there was an sslHandler that took ownership of the lifecycle here, it would handle release when handlerRemoved is called, but because there is not we must do so here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this complexity comes from the fact netty leverages off-heap memory extensively. To manage that safely, netty uses explicit ref counting.

} else {
context.pipeline().remove(this);
context.pipeline().replace(ProtocolNegotiators.ServerOpportunisticTlsHandler.class, null, next);
context.fireUserEventTriggered(ProtocolNegotiationEvent.DEFAULT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we firing this even here and why we don't do the same in case when we handle ssl?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we handle Ssl, this is called when the SslHandshake is completed. Without this, the pipeline gets stuck here in the same way I mentioned above; this asks the pipeline to move on.

context.pipeline().replace(this, newNonSslHandlerName(), handler);
} else {
context.pipeline().remove(this);
context.pipeline().replace(ProtocolNegotiators.ServerOpportunisticTlsHandler.class, null, next);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what original OptionalSslHandler was doing, why do we need this line?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ProtocolNegotiators.ServerOpportunisticTlsHandler is not a netty thing it's a gRPC thing, so this wouldn't belong in the original.

The idea here is we want the context.pipeline to match that of the plaintext case. In the tls case the pipeline looks like this:
GrpcNegotiationHandler -> ServerTlsHandler -> WaitForActiveHandler
In the plaintext case, the middle is skipped
GrpcNegotiationHandler -> WaitForActiveHandler
So this restores the latter pipeline. Without this line, we keep hitting the ServerOpportunisticTlsHandler and thereby keep adding this handler when we don't need to, which results in us repeating this pathway until we get a io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason

@Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
if (evt instanceof ProtocolNegotiationEvent) {
pne = (ProtocolNegotiationEvent) evt;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this typecast? I don't see how we are using pne object in this function

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java 🙃 This is considered an Object without the typecast.
pne is used below in fireProtocolNegotiationEvent

@joybestourous
Copy link
Owner Author

joybestourous commented Apr 22, 2024

My main concern here is that this code is supposed to be merged upstream and we may not actually need it by the time we will be able to actually use it.

Yes, this is definitely a concern, especially with grpc-java upgrades blocked due to grpc#11108 . That said, I'm not sure if there is a better way of doing this. I could've missed something though, do you have ideas?

Is my understanding correct that the main reason why this is an upstream PR is that we want to reuse ServerTlsHandler?

Yes, this is correct, in the TLS case we want our implementation to match that of upstream, and we currently can't do that on our end (a lot of functionality is private, so we cant even copy-pasta it). Without that I think it makes it risky to move off of opp TLS to pure TLS. On top of this the NettyServerCredentials they advertise for plugging in your own handshake control is private and use of their internal version of it is discouraged.

(I don't see , for example, how we can add experiments to CustomOptionalSslHandler in the current implementation)

Experiments would not go here. I still need to consider how we'll do this on the client side, but this is specifically for the server, and would happen on our end at configuration time, not gRPC's at runtime. If oppTLS works correctly we really should never need a runtime update to turn it off, so I think that is acceptable.

EDIT: I did take a look at the client side, which is implemented in pretty much the same way as the server side; if we want experiments on the client side, we need the pathway to be to make NettyChannelCredentials public (or use the internal one) and make GrpcNegotiationHandler and WaitUntilActiveHandler public.

Another thing, that is not related to PR itself, is that I don't understand how netty events are used by grpc, which to me is the major source of complexity. I am going to look more into upstream code to better understand it.

Happy to chat about it if you'd like.

@s-matyukevich
Copy link

For context here is the summary of your offline discussion:

  • We need to look more into options that won't require upstream changes, mostly because we'll need to plug-in experiments into netty tls handler when we start switching clients to TLS at runtime. Also forking grpc without clear exist strategy is another thing we should avoid.
  • In order to do that my proposal was to write custom protocol negotiator and server tls handler. I think that we can simply skip setting any security attributes on the resulting channel, like we do here If we do that we also don't need to process any channel events, like we do here The result should be that grpc will consider the channel insecure, which is not a bad thing IMO (This is only used to migrate from insecure channels so we shouldn't break anything) I think that if we do that the resulting code should become way simpler as netty handles all aspects of creating secure channels, and the rest of the code will be just boilerplate to fit netty OptionalSslHandler into grpc abstractions.

@joybestourous
Copy link
Owner Author

Closing given comments on grpc#11117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants