Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the missing connect_timeout and keepalives_idle config parameters #1847

Merged
merged 2 commits into from May 10, 2019
Merged

Added the missing connect_timeout and keepalives_idle config parameters #1847

merged 2 commits into from May 10, 2019

Conversation

boromisp
Copy link
Contributor

@boromisp boromisp commented Mar 6, 2019

Setting the keepAliveIdleMillis parameter to a lower value (200 sec for an AWS server) can help to prevent silently dropped connections in case of inactivity (e.g., LISTEN / NOTIFY).

With the connectTimeoutMillis you can give up the connection attempt before the OS does it for you. This is useful if the server is too overloaded to accept the TCP connection or the firewall doesn't send TCP RST on closed ports (e.g., AWS Security Groups).

I have no idea, how to write tests for these. Maybe we could test, whether we can establish the connection with the parameters set.

Edit:
The connectTimeoutMillis might have to be renamed as it is too similar to the connectionTimeoutMillis parameter, which is the maximum time to wait for free pg-pool capacity. We could use the connect_timeout (as it is called in libpq), but then we might want to switch the units to seconds, to reflect the libpq behavior.

Copy link
Owner

@brianc brianc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate this!

For testing things like this in the past I've created a tcp server running on a random port & then connecting to it w/ very low timeout values in the tests & then ensuring the test file exits properly (indicating the connection/stream has been closed).

Wonder if something like this would work or you could even piggy-back more tests in that file? Is that a helpful pointer at all?

@@ -21,6 +21,8 @@ var Connection = function (config) {
config = config || {}
this.stream = config.stream || new net.Socket()
this._keepAlive = config.keepAlive
this._keepAliveIdleMillis = config.keepAliveIdleMillis
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should call this keepAliveInitialDelayMillis to better indicate what it actually does? https://nodejs.org/api/net.html#net_socket_setkeepalive_enable_initialdelay

Copy link
Contributor Author

@boromisp boromisp Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a way to delay accepting a new TCP connection in node.js. If the socket is not listening, then the OS will refuse the connection immediately. I have an other idea that I will have to test: creating the server in an other process and blocking the event loop with busy waiting.

Edit:
Tried it, did not work. I'm unsure if it is even possible to reliably delay accepting connections in node.js, I do not know its internals well enough.

Maybe mocking the socket with something like this could work, if that makes any sense.

@boromisp
Copy link
Contributor Author

boromisp commented Mar 8, 2019

The connectTimeoutMillis and connectionTimeoutMillis parameters could be merged (as in 'wait at most 5 seconds for an available connection'), I'm not sure if there is a use-case to for setting them to different values. The error message would still differentiate the cause of the timeout.

Since it would change the behavior of an existing parameter, this would either be a breaking change, or a bugfix, depending on whether you rely on the documentation or the current behavior.
As the documentation of pg-pool states:

connectionTimeoutMillis: 1000, // return an error after 1 second if connection could not be established

@boromisp
Copy link
Contributor Author

While trying to devise a test for the connection timeout, I've realized, the timeout is canceled too early, at the establishment of the TCP connection.
Instead, we should wait for the connect or error event of the Client instance.
This change incidentally makes the timeout testable the way @brianc linked earlier.

@boromisp
Copy link
Contributor Author

I've renamed the connectTimeoutMillis parameter to connectionTimeoutMillis. This parameter is now shared with pg-pool. I'm not sure whether this counts as a breaking change.

I've added the keepalives, keepalives_idle and connect_timeout parameters to the connection string for the native driver. While I made these changes, I've had to touch the ssl parameter handling too.

I've written tests for both parameters (and fixed a bug with the keep alive).

@boromisp
Copy link
Contributor Author

Rebased to master, and fixed a flaky connection timeout test (it failed with the native driver sometimes).

@BenBirt
Copy link

BenBirt commented Apr 17, 2019

Hey there, I just came across this PR. We (https://github.com/dataform-co/dataform) would love to use the connect_timeout parameter. Any chance this PR will be merged to master soon?

@vitaly-t
Copy link
Contributor

vitaly-t commented May 9, 2019

Is there any reason this PR hasn't been merged yet?

@brianc
Copy link
Owner

brianc commented May 9, 2019

No good one! I'll merge this tomorrow first thing & release a new minor version. Thanks for the ping. 😄

@brianc
Copy link
Owner

brianc commented May 10, 2019

Sorry for the delay - gonna merge this thing in now & document it in release notes & push a new minor version out.

@brianc brianc merged commit 0993e4b into brianc:master May 10, 2019
@brianc
Copy link
Owner

brianc commented May 10, 2019

published pg@7.11.0! thanks again 👏

con.stream.destroy(new Error('timeout expired'))
}, this._connectionTimeoutMillis)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that con.once('end', () => { also should have clearTimeout(connectionTimeoutHandle), use case:
connectionTimeoutMillis set to some value.
Application receive SIGTERM or any other shutdown signal, because client still not connected we can not call client.end(), so application can call client.connection.stream.destroy().
If destroy will be called with error, then error event will be raised, connectingErrorHandler will be executed and timeout connectionTimeoutHandle will be cleared. Everything OK.
If destroy will be called without arguments, clearTimeout(connectionTimeoutHandle) will never happened and applucation will be hang up while timeout will not be called.


Why we can not call client.end():
Current end code:

  if (this.activeQuery) {
    // if we have an active query we need to force a disconnect
    // on the socket - otherwise a hung query could block end forever
    this.connection.stream.destroy()
  } else {
    this.connection.end()
  }

Because activeQuery is undefined while client is not connected, connection.end() will be called, which itself implies opened connection, because write data to socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct; the timer should be cleared in that event handler too.

To make sure I understand your use case, are you calling client.connection.stream.destroy() in your signal handler if the connection hasn't been established yet, or is there another way to trigger this behavior?
And passing in an error, e.g. client.connection.stream.destroy(new Error('SIGTERM')) is the workaround?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I do not call stream.destroy(), but I'd like modify application, because hang up is not cool.
stream.destroy(new Error(...)) is workaround, yes. But from other side, destroy called from client.end called without arguments... (which will not work actually in current case, just as example).

Copy link
Contributor Author

@boromisp boromisp Jun 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing just to confirm.

Calling client.end() before the connection is established seems to wait for the connection to succeed (in which case the timeout will be cleared). There will be no activeQuery before connecting successfully.

Calling client.connection.stream.destroy() (without an error) will reject the client.connect() promise with a Connection terminated unexpectedly error and wait for the timeout.

Calling client.connection.stream.destroy(new Error()) (with an error) will reject the client.connect() promise with the passed in error and not wait for the timeout.

The only way to make the application hang on this timer is to call client.connection.stream.destroy() from your own code. And the solution is to pass in an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants