New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTS can leak handshake RPCs #7622
Comments
IIUC, the ALTS handshake won't be triggered until |
@jiangtaoli2016, creating a stub normally would be fine and no problem. But the Yes, no payload will be sent on that RPC until the TCP connection is established, but the RPC still exists, even if no messages are being exchanged. |
I see. Thanks @ejona86 for clarifying. Any suggestion for the fix? |
When we say that the RPC exists, it's meant that the handshake RPC has actually been initiated e.g. with initial metadata sent to the handshake service, right? |
@apolcyn, yes. But no messages have been sent; just the initial metadata. |
@jiangtaoli2016, it should not create the RPC until the TsiHandshakeHandler starts the handshake. There's two ways I see: 1) lazily create the RPC in (1) is probably less invasive for tests. (2) would make me more comfortable. Although creating the RPC in the constructor is a code smell and surprising, so doing both might be best. |
@ejona86 Can we just do (1), so that RPC will not be created in ctor. |
(1) sounds fine. |
Fixed by #7630 |
The OnHandshake RPCs start too early. They are started before the TCP connection is even established (when the stub is created, which is part of newHandshaker). If the connection is never established (e.g., the server can't be reached), the TsiHandshakeHandler won't be added to the Channel and so the RPC cleanup will never fire. Before the timeout added in #7589, this would be very bad.
We need to fix the bug, and evaluate how exposed our existing clients are. The v1.34 release is semi-okay, since it has a timeout in place so these busted RPCs won't live forever. But in the case that all backend connections are failing, it might still produce ugly results.
CC @veblush, @apolcyn
The text was updated successfully, but these errors were encountered: