Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALTS can leak handshake RPCs #7622

Closed
ejona86 opened this issue Nov 14, 2020 · 9 comments
Closed

ALTS can leak handshake RPCs #7622

ejona86 opened this issue Nov 14, 2020 · 9 comments
Assignees
Labels
Milestone

Comments

@ejona86
Copy link
Member

ejona86 commented Nov 14, 2020

The OnHandshake RPCs start too early. They are started before the TCP connection is even established (when the stub is created, which is part of newHandshaker). If the connection is never established (e.g., the server can't be reached), the TsiHandshakeHandler won't be added to the Channel and so the RPC cleanup will never fire. Before the timeout added in #7589, this would be very bad.

We need to fix the bug, and evaluate how exposed our existing clients are. The v1.34 release is semi-okay, since it has a timeout in place so these busted RPCs won't live forever. But in the case that all backend connections are failing, it might still produce ugly results.

CC @veblush, @apolcyn

@ejona86 ejona86 added the bug label Nov 14, 2020
@ejona86 ejona86 added this to the Next milestone Nov 14, 2020
@jiangtaoli2016
Copy link
Contributor

IIUC, the ALTS handshake won't be triggered until waitUntilActiveHandler has been triggered. Yes, a stub has been created, but it is a shared channel to the handshaker service and there won't be any payload sent to the handshaker service until the connection to peer has been established. Is this still an issue?

@ejona86
Copy link
Member Author

ejona86 commented Nov 16, 2020

@jiangtaoli2016, creating a stub normally would be fine and no problem. But the AltsHandshakerStub isn't a gRPC stub. It is its own class that creates an RPC in its constructor. That is the RPC that will leak.

Yes, no payload will be sent on that RPC until the TCP connection is established, but the RPC still exists, even if no messages are being exchanged.

@jiangtaoli2016
Copy link
Contributor

I see. Thanks @ejona86 for clarifying. Any suggestion for the fix?

@apolcyn
Copy link
Contributor

apolcyn commented Nov 16, 2020

When we say that the RPC exists, it's meant that the handshake RPC has actually been initiated e.g. with initial metadata sent to the handshake service, right?

@ejona86
Copy link
Member Author

ejona86 commented Nov 16, 2020

@apolcyn, yes. But no messages have been sent; just the initial metadata.

@ejona86
Copy link
Member Author

ejona86 commented Nov 16, 2020

@jiangtaoli2016, it should not create the RPC until the TsiHandshakeHandler starts the handshake. There's two ways I see: 1) lazily create the RPC in AltsHandshakerStub.send() or 2) pass a factory to TsiHandshakeHandler and have it create the Stub when appropriate.

(1) is probably less invasive for tests. (2) would make me more comfortable. Although creating the RPC in the constructor is a code smell and surprising, so doing both might be best.

@jiangtaoli2016
Copy link
Contributor

@ejona86 Can we just do (1), so that RPC will not be created in ctor.

@ejona86
Copy link
Member Author

ejona86 commented Nov 16, 2020

(1) sounds fine.

@ejona86
Copy link
Member Author

ejona86 commented Nov 30, 2020

Fixed by #7630

@ejona86 ejona86 closed this as completed Nov 30, 2020
@ejona86 ejona86 modified the milestones: Next, 1.34 Nov 30, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants