Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional maximum connection lifetime parameter #2027

Closed
hempels opened this issue Dec 13, 2019 · 11 comments
Closed

Add optional maximum connection lifetime parameter #2027

hempels opened this issue Dec 13, 2019 · 11 comments

Comments

@hempels
Copy link

hempels commented Dec 13, 2019

One of the issues we've run into with using Aurora Postgresql with node-postgres is that while the database is capable of auto-scaling read cluster instances, the existing connections do not react to those changes.

For example, if I have a pool of 60 connections against a single read cluster DB instance and then add a second read-cluster DB instance, the 60 pooled connections remain attached to the original instance. The second instance sits idle while the first is pegged.

My thinking is the simplest solution may be to apply an automatic connection reset strategy whereby no pool connection would live longer than X minutes. In a busy environment where the pool is maxed, you wouldn't want to close and re-open multiple connections at once, they would need to be handled one at a time. But it seems to me the performance impact of infrequently resetting connections would be low, and the advantage of benefiting from an auto-scaling DB would easily offset it.

@julienvey
Copy link

To add more insight to this issue, this configuration options exists in most of PG libraries for other langages

See https://golang.org/pkg/database/sql/#DB.SetConnMaxLifetime or https://www.npgsql.org/doc/connection-string-parameters.html#pooling (ConnectionLifetime)

We have the same issue as @hempels. We use auto-scaled read-only DBs on a kubernetes cluster and the connections to the different DBs are not evenly distributed

@jodem
Copy link

jodem commented Feb 12, 2021

Hi all, I think this has been solved when implementing the maxUses options. It's not based on time but on how many requests you did with the connection : #2157 .

@hempels
Copy link
Author

hempels commented Feb 12, 2021

maxUses is one lever that can be dialed, but is inadequate by itself. Set too low and system performance will be severely degraded under load. Set too high and it will not trigger often enough under lighter loads to be of any use.

@jolynch
Copy link

jolynch commented Sep 20, 2021

This would also be very useful for any case where the client is connecting through proxies for failover/scaling (haproxy, envoy, service-meshes etc ...) where you need the maximum connection lifetime to be under some deadline so you can reload connections gracefully. We've been working around this by catching and handling the close errors but that's not particularly great.

@hempels
Copy link
Author

hempels commented Oct 5, 2021

To add more insight to this issue, this configuration options exists in most of PG libraries for other langages

The .net library's docs even say this in the description for its ConnectionLifetime parameter:

This is useful in clustered configurations to force load balancing between a running server and a server just brought online.

@ChrisWritable
Copy link
Contributor

The rebased update is here: #2698

@jolynch
Copy link

jolynch commented Feb 1, 2022

This is awesome thank you @brianc and @ChrisWritable! Just curious is there any reason not to set the default to something like 10-30 minutes? Having unbounded connection lifetime by default is a tad tricky for the server owners since Postgres doesn't implement any kind of GOAWAY procedure so the server administrators can't do graceful maintenance without a time bound on connection lifetime.

For context Java's Hikari chooses 30 minutes as the default, although something lower would probably be better (maybe 10 minutes). Since maxLifetime just controls when the connection stops being handed out it should be safe to expire them? Is it because this implementation doesn't appear to do any kind of skewing we might be vulnerable to mass expiration (all the connections expiring at once)?

@ChrisWritable
Copy link
Contributor

The default connection lifetime was chosen only so it wouldn't change pre-existing behavior. Otherwise, I don't have a strong opinion.

@brianc
Copy link
Owner

brianc commented Feb 1, 2022

The default connection lifetime was chosen only so it wouldn't change pre-existing behavior. Otherwise, I don't have a strong opinion.

☝️ yah totes - backwards compatibility is one of the biggest design drivers. Personally I find breaking changes when upgrading modules frustrating and can only imagine what multiplying that by the deploy surface of this library causes in total anguish globally. 😛 Obviously I don't want to stop all progress, but I try to be very selective for breaking changes....except in the case of security problems (thankfully haven't had many) - those will be fixed with whatever breaks require as soon as possible (w/ proper semver and release announcements and all that).

Apologies for not getting this released already. I like to do releases AM CST so I can be around to quickly respond if unforseen issues come up, and my past two mornings have been absolutely hectic. It's on my schedule for tomorrow.

@jolynch
Copy link

jolynch commented Feb 2, 2022

How would this change be a breaking change? The connection should expire while the user is not using it (if they have checked it out of the pool then it shouldn't expire while they're using it but instead when they return it to the pool). The only thing I can think of looking at the implementation is there isn't any skewing or concurrency limiting on how many can expire at once and that might be a performance issue if your DB can't handle the connections, but no worse than at startup right? Or are you just saying that you're not sure about the performance implications and don't want to change the default until some folks try it out in production?

Also fwiw at least in our scenario it is a security issue because our server team has to rotate mTLS connections periodically (every hour) to ensure we re-authorize the client cert... but I don't think that's a compelling reason for the default should be changed.

@hempels
Copy link
Author

hempels commented Feb 11, 2022

How would this change be a breaking change?

Every time a connection is closed, a new one may have to be created for the next incoming request. There is a performance and resource penalty associated with opening connections. Changing the default behavior of the library could negatively impact the runtime behavior (performance) of every server using it, thus, I would consider it a breaking change.

Separately, there isn't a great default value for this option other than 0 (disabled). Determining the ideal value for it is a system tuning exercise and will likely be different for each environment. Case in point, Hikari's developers opted for 30 minutes and you would prefer something shorter. Better to let each user select their preferred value. Documentation would be a reasonable place to add a recommendation to enable it and an explanation of the pros/cons of short versus long values.

there isn't any skewing or concurrency limiting on how many can expire at once

This is a legitimate concern, not addressed by this patch. Whether or not it represents an issue would also be environment specific.

@hempels hempels closed this as completed Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants