Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaking net.Conn #4833

Closed
aaronbee opened this issue Oct 1, 2021 · 9 comments · Fixed by #4837
Closed

Leaking net.Conn #4833

aaronbee opened this issue Oct 1, 2021 · 9 comments · Fixed by #4837

Comments

@aaronbee
Copy link

aaronbee commented Oct 1, 2021

What version of gRPC are you using?

v1.41.0

What version of Go are you using (go version)?

v1.17.1

What operating system (Linux, Windows, …) and version?

GOOS=linux, GOARCH=386

What did you do?

Wrapped the TCP listener passed to grpc.Serve with a net.LimitListener.

What did you expect to see?

As connections are closed the net.Conn returned from Accept() gets closed.

What did you see instead?

Some net.Conn connections are not being closed. This behavior combined with a net.LimitListener means the server will eventually exhaust the connection limit and never accept any new connections.

The symptoms I'm seeing are exactly as those described in this other bug report: #4632

My hunch is the the gRPC server is dropping some net.Conn without calling Close on them. The stdlib net.Conns typically have a finalizer set on them so that they are closed before being garbage collected, but this is not the case for listeners returned by net.LimitListener

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

This was working as expected in v1.39.1, v1.40.0, and the bug showed up in v1.41.0.

@dfawley
Copy link
Member

dfawley commented Oct 1, 2021

Do you have a reproducible testcase for this @aaronbee ? I'd like to determine if it could be #4692, which moved around some server-side connection handling stuff, by testing before/after that change.

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

Sorry, I do not have one I can easily share. We saw this issue come up in a test with a python grpc client that made several connections and disconnections over a period of time. 250/350 connections were closed correctly, 100/350 were not.

Some ideas for tests: wrap a listener with a net.LimitListener with a low limit, just enough to support the number of concurrent connections expected in the test, if any of the connections leak then later connections won't complete. Or you could write a custom net.Listener that keeps track of live connections and fails the test if any are still alive when the test ends.

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

I can run our test before and after #4692 to see if it the issue.

@dfawley
Copy link
Member

dfawley commented Oct 1, 2021

Are all of your 350 connections being created/disconnected in exactly the same way (same client, same credentials, etc, etc)? Is the client doing anything unusual, like disconnecting before handshaking is complete? Can you enable INFO logging on the server and post the results or let us know if you see anything unusual in them? Thanks!

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

I figured out how to reproduce this. Our test creates a tcp socket and then closes it immediately (ie. before any bytes are sent) to check when a server is alive. This function reproduces the issue:

func leakConn() {
    conn, err := net.Dial(network, address)
    if err != nil {
        panic(err)
    }
    conn.Close()
}

I believe I enabled the INFO logging, but nothing got logged.

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

It does look like #4692 is the culprit. I verified the leak doesn't happen just before this change, and does after this change is applied.

@dfawley
Copy link
Member

dfawley commented Oct 1, 2021

Thanks for the help in isolating the cause; I have a fix for this in #4837.

@aaronbee
Copy link
Author

aaronbee commented Oct 1, 2021

Thank you for the fix and the test! The changes are working on my end.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants