Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vapor forgets about an active request if the user returns an error whilst streaming the body (leading to file descriptor leaks) #3005

Open
weissi opened this issue Apr 26, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@weissi
Copy link
Contributor

weissi commented Apr 26, 2023

Description

With Vapor 4.76.0 (and presumably all(?) earlier versions) it's quite easy to leak a NIO Channel and a file descriptor because Vapor essentially forgets the active HTTP request. This might be used as a denial-of-service vector if a route that will consistently error asynchronously can be found easily.

Problem in more detail

The issue is that in this code in HTTPServerRequestDecoder.swift

    mutating func didError(_ error: Error) -> Result {
        switch self.state {
        case .idle:
            self.state = .error(error)
            return .init(action: .close(error), callRead: false)
        case .writing:
            self.state = .error(error)
            return .init(action: .nothing, callRead: false)
        case .error:
            return .init(action: .nothing, callRead: false)
        }
    }

decides that when didError happens we are not going to call read (callRead: false) and when in .writing (which means streaming data to request.body.drain) it also sets the action to .nothing.

The result is that because we in error, we won't run any more user code, we are also not reading or closing.

This however means that the only way for us to get rid of that Channel/socket is if NIO itself can detect that the connection went away. That's not reliably possible because the remote might have just abandoned us, so no more packets will ever arrive. On macOS / iOS even in the face of a FIN/RST NIO won't be able to learn about this because it can only do so reliably if it's registered for reads (we aren't, not reading) or writes (we aren't, not writing).

Long story short: Vapor forgets about this request and NIO can't help it in all situations. Or I guess one could say that Vapor wants a kick from NIO (some read/write error or connection closure) but NIO can't necessarily do that because 1) the remote peer might just abandon us (without RST/FIN) and on Darwin NIO can't reliably detect RST/FIN if nothing's reading or writing.

Possible fix

It depends on what semantics Vapor wants. Probably it might be as easy as doing action: .close when .error whilst .writing. I don't think these errors are recoverable so closing it probably the best idea.

Not sure what exact semantics Vapor wants. If Vapor wants this to be recoverable, then it would need to set callRead: true but immediately dump any further body data received after entering .error.

In general callRead: false should probably not be used in terminal states like .error or .end. Instead, if receiving data in .error we should dump the data or just close the connection/stream.

Reproduction:

For this route

app.on(.POST, "error-later", ":delay", body: .stream) { request in
    let delayMS = request.parameters.get("delay", as: Int64.self) ?? 2_000
    let delay = TimeAmount.milliseconds(delayMS)
    print("received", request)
    var haveWritten = false
    return Response(status: .ok, version: request.version, body: .init(stream: { bodyWriter in
        request.body.drain { chunk in
            switch chunk {
            case .buffer(let buffer):
                if !haveWritten {
                    haveWritten = true
                    print("received first \(buffer.readableBytes) bytes of the request. Scheduling write in \(delay).")
                    return request.eventLoop.flatScheduleTask(in: delay) {
                        struct E: Error {}
                        return request.eventLoop.makeFailedFuture(E())
                    }.futureResult
                } else {
                    print("subsequence \(buffer.readableBytes) bytes")
                    return request.eventLoop.makeSucceededFuture(())
                }
            case .end:
                print("received end", request)
                return bodyWriter.write(.end)
            case .error(let error):
                print("received error", error, request)
                return bodyWriter.write(.end)
            }
        }
    }))
}

run

curl -v -m 1 -d "$(cat /dev/zero | head -c 1000000 | tr '\0' Y)" http://localhost:8080/error-later/2000

maybe 10 times and after a minute or so run (replace VaporHello with your binary name)

lsof -Pn -p $(pgrep VaporHello) | grep TCP; lskq -p $(pgrep VaporHello) | grep SOCKET

You'll likely see something like

$ lsof -Pn -p $(pgrep VaporHello) | grep TCP; lskq -p $(pgrep VaporHello) | grep SOCKET
VaporHell 79873 johannes   14u    IPv4 0x3925c7c9375db4d1        0t0                 TCP *:8080 (LISTEN)
VaporHell 79873 johannes   15u    IPv4 0x3925c7c93a385eb1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:54526 (CLOSED)
VaporHell 79873 johannes   16u    IPv4 0x3925c7c9375dbfe1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:54528 (CLOSED)
VaporHell 79873 johannes   17u    IPv4 0x3925c7c93a1174d1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:54529 (CLOSED)
VaporHell 79873 johannes   18u    IPv4 0x3925c7c93a12deb1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55109 (CLOSED)
VaporHell 79873 johannes   19u    IPv4 0x3925c7c9376d74d1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55110 (CLOSED)
VaporHell 79873 johannes   20u    IPv4 0x3925c7c939f869c1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55111 (CLOSED)
VaporHell 79873 johannes   21u    IPv4 0x3925c7c937789eb1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55112 (CLOSED)
VaporHell 79873 johannes   22u    IPv4 0x3925c7c93760d601        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55114 (CLOSED)
VaporHell 79873 johannes   23u    IPv4 0x3925c7c9375da9c1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55117 (CLOSED)
VaporHell 79873 johannes   24u    IPv4 0x3925c7c939f87fe1        0t0                 TCP 127.0.0.1:8080->127.0.0.1:55119 (CLOSED)
VaporHello           79873 fd               6  k--                 14 READ      SOCKET   -       a--- ---- ----- ----- ---- ---  --                  0 

As we can see, we have 10 file descriptors to sockets that are CLOSED. In the real world they may also appear as ESTABLISHED because if the remote peer just abandoned it, the kernel also doesn't know it's gone...

@weissi weissi added the bug Something isn't working label Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant