Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attach goroutine/fd leak when no I/O is ready #45052

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Feb 21, 2023

When only stdin is attached the goroutine can only ever exit if:

  1. The container pipe is closed while trying to write to it
  2. The client closes the stdin read pipe

This is because io.Copy does a read on the read side then a write to the write side.
If reading from the client's stdin pipe blocks, the goroutine will never get notified that the container pipe is closed.

Another similar case is when the client disconnects but io.Copy is blocking waiting for data from the container's stdout/stderr streams.
This is significantly trickier to resolve.
The 2nd commit fixes this for Linux only.

Fixes #37182

@cpuguy83 cpuguy83 changed the title Attach fd leak Fix attach goroutine/fd leak when no I/O is ready Feb 21, 2023
@cpuguy83 cpuguy83 force-pushed the attach_fd_leak branch 4 times, most recently from 19e39b2 to 53b4d3d Compare February 22, 2023 01:54
@thaJeztah
Copy link
Member

/cc @corhere

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the client-disconnects solution is more clever than it needs to be. In the case of TCP, the lowest common denominator for connection state tracking, each peer can either gracefully close its write stream (FIN) or forcefully abort the whole connection (RST) but there is no affordance in the TCP state machine to signal gracefully that a peer is no longer interested in receiving any more data. And the only portable way to check whether a socket is still open is to try reading or writing at least one byte and see if it succeeds. What if we always used the client-closed-for-writing signal as our cue to stop copying stdout/stderr from the container? That is, have a goroutine which calls conn.Read() even when stdin isn't attached to the container, and closes stdout/stderr when the read returns io.EOF or similar. The same code path would work for UNIX domain sockets, TCP sockets, WebSockets and Windows named pipes; no platform-specific event handling required! And all the bases would be covered for TCP: FIN, RST and keep-alive timeout will all result in EOF on read.

api/server/router/container/notify_linux.go Outdated Show resolved Hide resolved
@cpuguy83
Copy link
Member Author

@corhere You are right I think this could work.
I've added a 3rd commit with the changes.

@cpuguy83
Copy link
Member Author

cpuguy83 commented Mar 3, 2023

So it seems like there is a problem with that 3rd commit, possibly because the read side is shutdown when there's nothing on stdin.

@corhere
Copy link
Contributor

corhere commented Mar 31, 2023

Sigh, the CLI half-closes the connection to signal EOF on stdin. So much for that idea. It's looking like the sorts of OS-specific close notifications you had implemented initially is really the only thing we could do to fix the leaks within the current API versions.

Using socket half-close to signal stdin closed rather than attach-done is arguably a design flaw in our container-attach protocol. With the WebSocket attach protocol also busted (ws provides framing but no multiplexing so stdout can't be distinguished from stderr) I see an opportunity to fix all these flaws... the third time around. I'm imagining new WebSocket attach protocol (paired with new endpoints?) for attach and exec which:

  • multiplexes output streams in a way which takes advantage of WebSocket framing (i.e. not stdcopy)
  • unconditionally multiplexes the output streams even when there is only a single output stream so that the daemon could signal the client out-of-band
  • also multiplexes stdin with out-of-band signaling for stdin-EOF and perhaps other things (SIGWINCH without making a separate request?)
    But I digress...

@cpuguy83
Copy link
Member Author

cpuguy83 commented Jul 5, 2023

Rebased this and removed the 3rd commit which didn't work as hoped.

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super thrilled about the un-idiomatic use of callbacks in notifyClosed but I can understand why: the more idiomatic designs (synchronous blocking call returning an error, return a child context in the same vein as "os/signal".NotifyContext) would take a nontrivial amount of additional plumbing and refactoring. I have to concede that you have gone with the most pragmatic implementation.

LGTM!

api/server/router/container/notify_unsupported.go Outdated Show resolved Hide resolved
api/server/router/container/notify_linux.go Outdated Show resolved Hide resolved
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@cpuguy83
Copy link
Member Author

Rebased and moved EINTR handlers to the unix_noeintr package.

@cpuguy83 cpuguy83 force-pushed the attach_fd_leak branch 2 times, most recently from 00b1448 to f9468e1 Compare March 21, 2024 19:23
Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (with a couple nits), with a bit of being sad about not being able to solve this on Windows, but I don't begrudge not wanting to figure out a separate async I/O API.

daemon/attach.go Outdated Show resolved Hide resolved
api/server/router/container/notify_unsupported.go Outdated Show resolved Hide resolved
@thaJeztah
Copy link
Member

Failure on rootless looks related; https://github.com/moby/moby/actions/runs/8428203898/job/23080313424?pr=45052#step:8:1254

=== FAIL: amd64.integration.container TestAttachDisconnectLeak (0.07s)
    attach_test.go:102: assertion failed: infoAttach.NGoroutines is <= info.NGoroutines

I kicked CI again to see if it's flaky

When only stdin is attached the goroutine can only ever exit if:

1. The container pipe is closed while trying to write to it
2. The client closes the stdin read pipe

This is because `io.Copy` does a read on the read side then a write to
the write side.
If reading from the client's stdin pipe blocks, the goroutine will never
get notified that the container pipe is closed.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83
Copy link
Member Author

Updated the integration test to hopefully make it less likely to flake.

@cpuguy83 cpuguy83 force-pushed the attach_fd_leak branch 2 times, most recently from 2df3ff1 to ac05457 Compare May 23, 2024 18:13
@cpuguy83 cpuguy83 force-pushed the attach_fd_leak branch 4 times, most recently from 2fb7062 to e87ce9d Compare May 24, 2024 21:31
In cases where the client disconnects and there is nothing to read from
a stdio stream after that disconnect, the copy goroutines and file
descriptors are leaked because `io.Copy` is just blocked waiting for
data from the container's I/O stream.

This fix only applies to Linux.
Windows will need a separate fix.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This var for the incoming attach request.
Just within this one function we also have `cfg`, and `ctr` already, so
`c` just makes things more confusing.
Not to mention `c` is usually referencing a container object in other
parts of the code.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83
Copy link
Member Author

Ok I think the test issue was that it was using the main integration daemon and the goroutine count was fluctuating at just the right times to break it.
I've switched it to use to use its own daemon to help reduce noise and that seems to have done the trick.

@cpuguy83
Copy link
Member Author

Ping, does this LGTY?

@cpuguy83
Copy link
Member Author

btw, I've opened a proposal on Go to extend the syscall.RawConn interface (or add an additional interface) to receive close notifications.
golang/go#67432

@cpuguy83
Copy link
Member Author

cpuguy83 commented Jun 4, 2024

Ping

integration/container/attach_test.go Show resolved Hide resolved
Comment on lines +34 to +44
err := unix_noeintr.EpollCtl(epFd, unix.EPOLL_CTL_ADD, int(fd), &unix.EpollEvent{
Events: unix.EPOLLHUP,
Fd: int32(fd),
})
if err != nil {
log.G(ctx).WithError(err).Warn("notifyClosed: failed to register fd for close notifications")
return
}

events := make([]unix.EpollEvent, 1)
if _, err := unix_noeintr.EpollWait(epFd, events, -1); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think EPOLLHUP is emitted on a graceful close? Not 100% sure though, but in this case it would cause this goroutine to block indefinitely?

Should this also poll for EPOLLRDHUP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything using EPOLL_CTL_ADD with EPOLLHUP is not needed because it always looks for EPOLLHUP events.
EPOLLRDHUP is sent when the peer closed the write side.
EPOLLHUP, as I recall, should be when the connection is gone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test with pure syscalls: https://gist.github.com/vvoland/cc9d31f0202cd233c2c2affd74294e6f

And it looks like shutdown + close on the client side doesn't trigger EPOLLHUP.

This might not actually be the same in our case since we're much higher up in the stack and there might be other things actually triggering the EPOLLHUP along the way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I must admit - I don't know exactly how EPOLLHUP works, so I'll just trust you if you say that it works as expected 👍🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EPOLLRDHUP would be like the tcp-half close situation which is why a simpler implementation that doesn't require epoll wouldn't work as was discussed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker daemon is leaking file descriptors (with reproduction)
5 participants