Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"socket hang up" / ECONNRESET on consecutive requests with Node.js 19 and Node.js 20 #1735

Closed
dhedey opened this issue Apr 19, 2023 · 10 comments · Fixed by #1736
Closed

"socket hang up" / ECONNRESET on consecutive requests with Node.js 19 and Node.js 20 #1735

dhedey opened this issue Apr 19, 2023 · 10 comments · Fixed by #1736

Comments

@dhedey
Copy link
Contributor

dhedey commented Apr 19, 2023

Overview

When using node-fetch on Node.js 19+ to make two consecutive requests to the same web server, we always get the error: FetchError: request to <URL> failed, reason: socket hang up

TLDR: This issue is caused by nodejs/node#47130 in nodejs, exacerbated by an issue with node-fetch blindly sending Connection: Close headers. This was fixed via #1736 in node-fetch v3.3.2 (note: 3.x is only ESM-enabled). This will hopefully be fixed in node-fetch v2 once #1765 is merged and released.

Details

This error only arises with node-fetch and Node.js 19+ (tested using 19.9.0 and 20.0.0), and the web server we were using was Axum. We didn't see this issue when using a node-based proxy to help with debugging - I imagine it depends slightly on the order of TCP packet delivery and latency and the internal state machines of the HTTP servers - but it seems like many HTTP servers would see this issue.

The error appears to only occur where the second request to the same server in the same round of the event loop as the first request was awaited, which gives the error:

FetchError: request to <URL> failed, reason: socket hang up

Explanation

Root cause

We believe the following explains what we're seeing, and is backed up by logs at the end of this post:

  • fetch sends a connection: close header in its first HTTP request...

    • The server responds with a http response, which fetch hands back to the caller.
    • The server actions the close connection request by following up the http response with a TCP packet with the FIN flag. This marks the end of the connection from the server's point of view. At this point, it does not expect to receive any further requests.
  • Still in the same event loop, the caller calls fetch again to make its second HTTP request to the same origin.

    • Node.js attempts to send the next request to the host using the same connection
    • But the TCP connection is closed at the server's end, so returns a conn reset at the TCP layer
    • Resulting in the socket hang up

And I believe this may be the cause:

  • The first response from the server is handed back to the caller likely in the poll phase of the event loop round).
  • In the mean time, the server has responded to close the TCP connection - but the event loop doesn't get round to handling it (not least because it's not yet in the close callbacks phase)
  • The caller then attempts to send a new request. But as far as the event loop is concerned, we're still at the poll stage, and haven't yet handled the socket close event...
  • So I would guess that Node.js internally has yet to close the connection (because it's waiting for the close callbacks stage?), so attempts to send the HTTP request on the same (as yet unclosed) connection.
  • The server receives this request but has already closed the connection - so responds with connection reset message... which results in fetch raising the socket hang error.

Basically it's a race condition between the server closing the connection and the client sending the second request... but it happens reproducibly 100% of the time, and is caused by buggy behaviour, possibly in the Node.js agent?

I would imagine that Node.js 19 changed some internal handling of socket closes which delayed the handling of the close when a connection close http request is sent, and introduced - but I'm not sure if it's Node.js or node-fetch at fault.

Actions

Workarounds

The following both appeared to work - but for what it's worth, I don't think either of these is a "fix" because it's no longer an easy isomorphic drop in for the browser fetch.

  • It appears that inserting the line await new Promise((resolve) => setTimeout(resolve, 0)) before every API call to try to ensure any connections are closed fixes the problem.
  • Using a custom http / https agent with keepalive, eg:
import http from 'node:http';
import https from 'node:https';

const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true });
const agentSelector = function(_parsedURL: any) {
    if (_parsedURL.protocol == 'http:') {
        return httpAgent;
    } else {
        return httpsAgent;
    }
}

// ...

const response = await fetch(url, { agent: agentSelector });

Suggestions for fixes

TLDR: These lines are incorrect in node-fetch

There is arguablably issues in both node-fetch and Node.js itself.

However this is done, there are race conditions where a server can close a connection at the same time as you try to send another message on the connection - and this needs to be handled - I guess normally this is where applications implement retry logic (although it's an argument that libraries should have this - although maybe not node-fetch!)

BUT there is a perfect storm that makes this race condition happen every time (100% reproducible for me and my colleagues):

Previously, sending the Connection: close header explicitly didn't do anything wrong (even if it wasn't strictly needed). BUT sending a connection close down a keepalive connection isn't very nice behaviour - and trips up node's handling of the connection.

The following are three key fixes that could be made - all of which would fix the issue to various degrees:

Related to point the node-fetch point:

The node docs on http say this about the keepalive option on the agent:

keepAlive boolean - Keep sockets around even when there are no outstanding requests, so they can be used for future requests without having to reestablish a TCP connection. Not to be confused with the keep-alive value of the Connection header. The Connection: keep-alive header is always sent when using an agent except when the Connection header is explicitly specified or when the keepAlive and maxSockets options are respectively set to false and Infinity, in which case Connection: close will be used. Default: false.

This suggests to me that Node.js has decent default handling, and we should be letting Node.js handle this and not take matters into our own hands by sending a confusing Connection: close header. By sending the header, we could be interfering with Node.js's ability to keep the socket alive if (for example) the maxSockets are reached.

Also - the fact we only set Connection: close on the request when a user doesn't provide an agent feels a little weird - either it's worth doing when the agent is set to keepalive or not - regardless of whether we use the default agent or not.

-Finally - perhaps node-fetch could consider consider respecting the keepalive option of fetch itself and create the correct agent - with a cache - to prevent it from using the default agent - or not even use the default agent at all.-
EDIT: Sorry - I'm wrong. The keepalive option on fetch is an entirely separate concept, to do with keeping the connection alive after the page ends in the browser (eg a navigator.sendBeacon replacement) and not related to http keepalive - which makes sense - fetch is more high-level than that.

Logs

TCP Dump

From running tcpdump -A -ni lo tcp port 3333 to monitor the TCP messages between the server and the node-fetch client.

The key parts are (see reference):

  • A 47630->3333 http connection:close request
  • Server responds with the http response, followed by a Flags [F.] FIN packet (seq
  • Another request attempt on the same connection (same port and seq number continues)
  • Server response with the reset flag (Flags [R]) RST - reset connection packet.
19:28:09.351131 IP 127.0.0.1.47630 > 127.0.0.1.3333: Flags [P.], seq 1:225, ack 1, win 512, options [nop,nop,TS val 109329269 ecr 109329241], length 224

POST <PATH1> HTTP/1.1
Connection: close
Host: 127.0.0.1:3333

19:28:09.351166 IP 127.0.0.1.3333 > 127.0.0.1.47630: Flags [.], ack 225, win 510, options [nop,nop,TS val 109329269 ecr 109329269], length 0
19:28:09.352154 IP 127.0.0.1.3333 > 127.0.0.1.47630: Flags [P.], seq 1:2730, ack 225, win 512, options [nop,nop,TS val 109329270 ecr 109329269], length 2729
HTTP/1.1 200 OK
content-type: application/json
content-length: 2619
date: Wed, 19 Apr 2023 17:28:09 GMT

<<response-payload>>
19:28:09.352163 IP 127.0.0.1.47630 > 127.0.0.1.3333: Flags [.], ack 2730, win 497, options [nop,nop,TS val 109329270 ecr 109329270], length 0
19:28:09.352175 IP 127.0.0.1.3333 > 127.0.0.1.47630: Flags [F.], seq 2730, ack 225, win 512, options [nop,nop,TS val 109329270 ecr 109329270], length 0

// Second request

19:28:09.365378 IP 127.0.0.1.47630 > 127.0.0.1.3333: Flags [P.], seq 225:506, ack 2731, win 512, options [nop,nop,TS val 109329283 ecr 109329270], length 281
POST <PATH2> HTTP/1.1
Content-Type: application/json
Accept: */*
Content-Length: 24
User-Agent: node-fetch/1.0 (+https://github.com/bitinn/node-fetch)
Accept-Encoding: gzip,deflate
Connection: close
Host: 127.0.0.1:3333

<REQUEST_BODY_2>
19:28:09.365401 IP 127.0.0.1.3333 > 127.0.0.1.47630: Flags [R], seq 594996662, win 0, length 0

NODE_DEBUG=net logs

From running NODE_DEBUG=net yarn test

/// FIRST REQUEST

NET 54331: createConnection [
  [Object: null prototype] {
    protocol: 'http:',
    slashes: true,
    auth: null,
    host: '127.0.0.1',
    port: '3333',
    hostname: '127.0.0.1',
    hash: null,
    search: null,
    query: null,
    pathname: '<PATH>',
    path: null,
    href: 'http://127.0.0.1:3333/<URL>',
    method: 'POST',
    headers: [Object: null prototype] {
      Accept: [Array],
      'Content-Length': [Array],
      'User-Agent': [Array],
      'Accept-Encoding': [Array],
      Connection: [Array]
    },
    agent: undefined,
    keepAlive: true,
    scheduling: 'lifo',
    timeout: 5000,
    noDelay: true,
    servername: '',
    _agentKey: '127.0.0.1:3333:',
    encoding: null,
    keepAliveInitialDelay: 1000
  },
  [Function (anonymous)],
  [Symbol(normalizedArgs)]: true
]
NET 54331: pipe false null
NET 54331: connect: attempting to connect to 127.0.0.1:3333 (addressType: 4)
NET 54331: _read - n 16384 isConnecting? true hasHandle? true
NET 54331: _read wait for connection
NET 54331: afterConnect
NET 54331: _read - n 16384 isConnecting? false hasHandle? true
NET 54331: Socket._handle.readStart
NET 54331: _read - n 16384 isConnecting? false hasHandle? true

/// SECOND REQUEST

NET 54331: destroy
NET 54331: close
NET 54331: close handle
@dhedey dhedey added the bug label Apr 19, 2023
@dhedey dhedey changed the title "socket hang up" / ECONNRESET on consecutive requests with Node 19 and Node 20 "socket hang up" / ECONNRESET on consecutive requests with Node.js 19 and Node.js 20 Apr 19, 2023
dhedey added a commit to dhedey/node-fetch that referenced this issue Apr 21, 2023
Instead, we rely on the underlying http implementation in Node.js
to handle this, as per the documentation at
https://nodejs.org/api/http.html#new-agentoptions

This fixes node-fetch#1735 and likely replaces node-fetch#1473

The original change introducing this provided no clear motivation
for the override, and the implementation has since been changed to
disable this header when an agent is provided, so I think there
is sufficient evidence that removing this is the correct behaviour.
node-fetch@af21ae6
node-fetch@7f68577
@boneskull
Copy link

We're seeing this behavior in Axios v1.4.0 as well. I'm not sure it's specific to fetch.

The "agent" workaround fixes it; the "setTimeout" one didn't help.

@mbrevda

This comment was marked as outdated.

@dhedey
Copy link
Contributor Author

dhedey commented Jun 2, 2023

@mbrevda - indeed it should! I've updated it, thanks. Good spot.

@Satya-Sidhu
Copy link

Satya-Sidhu commented Jul 11, 2023

Hi
I am trying to install TLDR (alternative to man pages) on a Ubuntu 20.04 VM. When I run the command tldr --update, I am getting the following error:

✖ Updating...
Error: socket hang up
at connResetException (internal/errors.js:614:14)
at Socket.socketOnEnd (_http_client.js:491:23)
at Socket.emit (events.js:327:22)
at endReadableNT (_stream_readable.js:1224:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21)

Any kind of help will be appreciated.
Thank you.

@damjanned
Copy link

This happens on the native node fetch too. I'm able to document it if it's of any help. I'm about to try if keep-alive will fix it.

jimmywarting pushed a commit that referenced this issue Jul 25, 2023
Instead, we rely on the underlying http implementation in Node.js
to handle this, as per the documentation at
https://nodejs.org/api/http.html#new-agentoptions

This fixes #1735 and likely replaces #1473

The original change introducing this provided no clear motivation
for the override, and the implementation has since been changed to
disable this header when an agent is provided, so I think there
is sufficient evidence that removing this is the correct behaviour.
af21ae6
7f68577
@github-actions
Copy link

🎉 This issue has been resolved in version 3.3.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

@dhedey
Copy link
Contributor Author

dhedey commented Jul 27, 2023

For those commenting on this issue cropping up in Axios and other libraries, this is due to the ticket here: nodejs/node#47130 (comment) - and poor handling of connection closing states in node itself.

That is related to, but perhaps separate from the node-fetch issue, which can reliably trigger this poor handling of closing states - to make node-fetch sadly unusable in some situations.


In other news - this has been merged and released in 3.3.2, although will need backporting to 2.x as well (in #1765)

dhedey added a commit to dhedey/node-fetch that referenced this issue Jul 27, 2023
Instead, we rely on the underlying http implementation in Node.js
to handle this, as per the documentation at
https://nodejs.org/api/http.html#new-agentoptions

This fixes node-fetch#1735 and likely replaces node-fetch#1473

The original change introducing this provided no clear motivation
for the override, and the implementation has since been changed to
disable this header when an agent is provided, so I think there
is sufficient evidence that removing this is the correct behaviour.
node-fetch@af21ae6
node-fetch@7f68577

This commit is backported to the v2 branch from node-fetch#1736 against v3.
jimmywarting pushed a commit that referenced this issue Aug 18, 2023
Instead, we rely on the underlying http implementation in Node.js
to handle this, as per the documentation at
https://nodejs.org/api/http.html#new-agentoptions

This fixes #1735 and likely replaces #1473

The original change introducing this provided no clear motivation
for the override, and the implementation has since been changed to
disable this header when an agent is provided, so I think there
is sufficient evidence that removing this is the correct behaviour.
af21ae6
7f68577

This commit is backported to the v2 branch from #1736 against v3.
joamag added a commit to hivesolutions/yonius that referenced this issue Oct 25, 2023
Tries to solve the problem reported in node-fetch/node-fetch#1735.
This can be considered a persistent issue that is only minimized by this solution.
0x009922 added a commit to 0x009922/iroha-javascript that referenced this issue Nov 1, 2023
The bug caused error to appear in CI:

  node-fetch/node-fetch#1735

Signed-off-by: Dmitry Balashov <a.marcius26@gmail.com>
@Noobgam
Copy link

Noobgam commented Nov 2, 2023

Still doesn't work for me in the setup

OS: windows
Node: v20.9.0
node-fetch: 3.3.2

function delay(ms: number) {
    return new Promise( resolve => setTimeout(resolve, ms) );
}

await fetch("http://127.0.0.1:8765");
await delay(1000);
await fetch("http://127.0.0.1:8765");
await delay(1000);
await fetch("http://127.0.0.1:8765");

This code throws ERRCONNRESET in case the delays are commented out.

The solution for me was to just remove the dependency on node-fetch completely and use native fetch, seems to work like a charm.

Noobgam added a commit to Noobgam/noob-agent that referenced this issue Nov 2, 2023
ddeboer added a commit to netwerk-digitaal-erfgoed/dataset-register that referenced this issue Nov 20, 2023
xxuejie added a commit to nervosnetwork/ckb-js-toolkit that referenced this issue Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@mbrevda @boneskull @dhedey @Noobgam @damjanned @Satya-Sidhu and others