More robust listeners example and the handling of network errors #593

boromisp · 2019-03-06T13:54:59Z

Disclaimer: my networking knowledge is patchy, most of this came from about a week of testing and googling.

A few notes for the robust listeners example:

It could be helpful, to make a note of TCP keepalives in the Robust Listeners example. TCP keepalives could also be relevant if you set a long idleTimeoutMillis for your pool.

Without setting the keepalive time to the value recommended by your infrastructure provider (e.g., if your database is in AWS, you have to set it to at most 200 sec), a seemingly idle connection can be silently dropped.

I'm not sure if it is an AWS specific thing, standard TCP peculiarity or something about nodejs defaults, but in this case, both the server and the client computer thinks the connection is still alive (so no error event, no onLost callback, and the PostgreSQL backend still running).

Example solution:

const conn = await db.connect({ direct: true, onLost: handleLostConnection });
conn.client.connection.stream.setKeepAlive(true, 200 * 1000);

The keepalive interval and probes should probably also be set explicitly, but I found no way to do that without depending on net-keepalive.

While the section above deals with trying to keep the connection alive, it is not sufficient for detecting dead connections. The only reliable way I found so far is to use manual heartbeat queries, and call conn.done() on some application-specific timeout.

What seems to be missing (which is a node-postgres deficiency) is a connection timeout. The connectionTimeoutMillis parameter only applies to the wait for free capacity in the pg-pool. There is no way to abort connecting if it takes too long.
For the direct connection case, this can be somewhat worked around, with the caveat, that it will keep the process running for however long it takes the OS to decide the connection attempt was unsuccessful:

function connectWithTimeout(db, onLost, timeout) {
  return new Promise(async function (resolve, reject) {
    let timedOut = false;

    function handleLostConnection(err, e) {
      if (!timedOut) onLost(err, e);
    }

    const timeoutTimer = setTimeout(function () {
      timedOut = true;
      reject(new Error('Connection timeout'));
    }, timeout);

    try {
      const conn = await db.connect({ direct: true, onLost: handleLostConnection });
      if (!timedOut) resolve(conn);
    } catch (error) {
      if (!timedOut) reject(error);
    } finally {
      if (!timedOut) clearTimeout(timeoutTimer);
    }
  });
}

Patching directConnect could help with this, since there we can destroy the socket on timeout, but I will try to put together a PR for node-postges to address this.

The text was updated successfully, but these errors were encountered:

vitaly-t · 2019-03-06T16:41:52Z

AWS has been the culprit when it comes to connectivity. They are awful in their connection management, and require a lot of extra work.

boromisp · 2019-03-06T16:56:01Z

In their redshift documentation they seem to imply it is your local network, VPN provider, or other intermediaries who drop the connection. But, because of the scale of their operation, I'm sure they are aggressively pruning idle connections. I just wish they would document it somewhere.

The brianc/node-postgres#1847 PR adds two new connection parameters to address these issues:

const db = pgp({
    keepAlive: true,
    keepAliveIdleMillis: 200 * 1000,
    connectTimeoutMillis: 10 * 1000,
})

The manual heartbeat would still be required of course, but I couldn't find a general solution for that.

vitaly-t · 2019-03-06T17:02:08Z

And yet, all the problems related to dropped connections ever reported were around AWS. The library works fine with all other hosting providers.

msageryd · 2019-03-27T14:15:12Z

Thank you for writing this up. I have referenced this issue in the readme of my listener-module, I hope it's ok with you.

https://github.com/msageryd/pg-promise-listener#keeping-connections-alive-on-aws

vitaly-t · 2019-03-27T14:37:02Z

@msageryd The other open issue #585 is also related.

msageryd · 2019-03-27T17:36:06Z

I solved this problem from another perspective. It was the simplest solution to me right now. Please chime in if you think this approach is flawed.

Documentation
Code

vitaly-t · 2019-03-28T05:44:10Z

@msageryd That code creates a loose promise:

this.connection.none(...)

i.e. unchained promise, that does not have a catch handler.

msageryd · 2019-03-28T06:37:42Z

Oh, thanks. Fixed now. (code)
What do you think of the overall concept? It's kind of a catch-all, and it might be a little more delayed than a real keep alive solution. But it should fixe the worst case, i.e. when you have a non-working server without knowing about it.

vitaly-t · 2019-03-28T07:14:55Z

It would take some good testing to form an opinion. I hope there will be a proper fix at some point.

vitaly-t · 2019-05-09T20:07:24Z

@boromisp @msageryd Guys, I do not see the point in keeping this issue open, because nothing is being developed or changed in this library to cater for that.

@boromisp If your PR ever gets merged, this can be revisited.

vitaly-t · 2019-05-15T13:18:02Z

Closing, due to inactivity.

vitaly-t · 2019-11-10T18:55:10Z

@boromisp Thanks for weighting in on this. I've just spent whole weekend working on new releases, then by chance looked at that slonik project only to find out he published more bullshit about pg-promise. Naturally, I was very pissed. And when I asked him to remove it, he was quick to lock my account.

k3n · 2022-06-01T19:32:54Z

@boromisp If your PR ever gets merged, this can be revisited.

I've followed the threads, but I'm confused and would appreciate some insight; what needs to be revisited here? A solution for the manual heartbeat to detect lost connections?

For what it's worth, @boromisp's PR was merged on 5/10, but this was closed on 5/15.

I'm wanting to use TCP keep alives but I'm unsure if the code in @boromisp's above comment is sufficient (and I'm not sure how to adequately test it locally).

Thank you.

boromisp · 2022-06-02T07:40:31Z

The keepAlive: true parameter enables the tcp keepalives on the socket, and the keepAliveInitialDelayMillis: 200 * 1000 parameter overrides the default value for the tcp keepalive idle timeout of the socket. There are other keepalive options, but those cannot easily be overridden from node.js.

For long running queries to remote databases tcp keepalives (either client or server side) are the only way to keep the connection alive.

Heartbeat queries are an alternative for listener connections, eg: brianc/node-postgres#1611 (comment)
I found it easier to reason about heartbeat query timeouts than tcp keepalives in this context.

vitaly-t added the dependency issue label Mar 6, 2019

vitaly-t closed this as completed May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More robust listeners example and the handling of network errors #593

More robust listeners example and the handling of network errors #593

boromisp commented Mar 6, 2019

vitaly-t commented Mar 6, 2019

boromisp commented Mar 6, 2019

vitaly-t commented Mar 6, 2019 •

edited

msageryd commented Mar 27, 2019

vitaly-t commented Mar 27, 2019

msageryd commented Mar 27, 2019

vitaly-t commented Mar 28, 2019 •

edited

msageryd commented Mar 28, 2019

vitaly-t commented Mar 28, 2019

vitaly-t commented May 9, 2019

vitaly-t commented May 15, 2019

vitaly-t commented Nov 10, 2019 •

edited

k3n commented Jun 1, 2022

boromisp commented Jun 2, 2022

More robust listeners example and the handling of network errors #593

More robust listeners example and the handling of network errors #593

Comments

boromisp commented Mar 6, 2019

vitaly-t commented Mar 6, 2019

boromisp commented Mar 6, 2019

vitaly-t commented Mar 6, 2019 • edited

msageryd commented Mar 27, 2019

vitaly-t commented Mar 27, 2019

msageryd commented Mar 27, 2019

vitaly-t commented Mar 28, 2019 • edited

msageryd commented Mar 28, 2019

vitaly-t commented Mar 28, 2019

vitaly-t commented May 9, 2019

vitaly-t commented May 15, 2019

vitaly-t commented Nov 10, 2019 • edited

k3n commented Jun 1, 2022

boromisp commented Jun 2, 2022

vitaly-t commented Mar 6, 2019 •

edited

vitaly-t commented Mar 28, 2019 •

edited

vitaly-t commented Nov 10, 2019 •

edited