Collect server latencies and implement "prefer faster hosts" when creating connections #3178

vlsi · 2024-03-24T08:35:20Z

Describe the issue

Currently, pgjdbc connects to hosts in sequential order, however, it becomes suboptimal if one of the hosts experiences issues.

For instance, if connect tohost1 takes 20 seconds, and connect to host2 takes 0.2 sec.
The driver could account for that and prefer connecting to host2.

Sample implementation:

Collect exponential moving average of the connect time and login time for each host
When running in "load-balance" mode, use weighted shuffle for the hosts so the faster ones are preferred
Additionally, we can collect exponential moving average for simple queries like "commit", "isvalid", so we better account for "server health"

Cons:

Since we can't "reconnect", as we create more connections to the faster host, it might cause workload skew

Connection rebalance:

We can't migrate a connection to a different host in a transparent way, however, we can "unconnect" by returning false form isValid
An alternative option would be to disconnect after a certain amount of time, or after a certain time spent in executing queries
Unfortunately, "treat idle connection as expired after some time" will not be fully transparent to the application as the application might use custom_variable_classes for connection-level variables which would be lost on disconnect. However, we should be fine to (optionally) disconnect when the connection is returned to the connection pool.

The text was updated successfully, but these errors were encountered:

davecramer · 2024-03-24T11:40:34Z

A wrapper driver would be able to reconnect. I like the idea

rob-bygrave · 2024-03-24T19:52:06Z

Is this addressing an actual issue that has been observed to occur? Does this issue occur frequently? Is this expected to be put into a separate optional module/jar or is this expected to be part of the core driver?

vlsi · 2024-03-25T08:25:50Z

The actual issue is that a single misbehaving replica might cause noticeable issues when creating connections.

The most pronounced case is primary switchover.

Imagine there are 6 replicas and the primary.
Imagine connect timeout is 10 seconds (current default). Remember it does not include TLS + login sequence

The driver would attempt each replica in sequence until it detects the primary host. It might take 6*10 = 60 sec.
Hikari sets the default login timeout to 30 seconds (it passes loginTimeout=30 to pgjdbc), and it causes failure to connect to the database no matter how many times we try so.

Currently the driver caches "ok, we know this was primary" information, and it enables us to connect to the primary faster. However, we do not cache "ok, we know this host times out, let's try the other hosts to see if they are any better". So I thought we could lower the priority for the servers that are known to have issues (e.g. connect timeouts)

davecramer · 2024-03-25T09:15:45Z

Is there no topology information? Most systems provide some topology somewhere ?
FWIW, I still think this is better in a wrapper. This https://github.com/awslabs/aws-advanced-jdbc-wrapper is obviously targeted at RDS Aurora, but has a plugin for Patroni (there's topology info available)

vlsi · 2024-03-25T09:19:02Z

The issue is that the current implementation does not handle the case well, and it does not specify "please do not use it, use something else instead". So the users end up with half-workable systems :-/

rbygrave · 2024-03-25T10:04:06Z

Imagine there are 6 replicas and the primary.

I'm not using that scenario where the driver itself is doing master failover / looping trying to find the next master. Master failover is either via DNS or its AWS RDS. Hmmm.

davecramer · 2024-03-25T10:58:27Z

This is actually one of the topics I want to bring up in Vancouver at the postgresql dev conference is to find a way to put topology data in the server. Presumably it would be a hook of some kind to accommodate various technologies. Patroni being one, but RDS has a few, I presume Azure and Google have theirs.

rbygrave · 2024-03-25T11:28:47Z

Random thoughts ...

Imagine there are 6 replicas and the primary.

An option might be to start an ExecutorService with 6 threads, test each replica in parallel? (something along the lines of testing the likely candidates in parallel)

topology data in the server

Yeah, looks like for the Aurora case, in order to speed up master failover it can connect to a replica instance, read the updated topology to find the new master defined there, then connect/point to new master. Seems like it would be really nice if there could be some standardisation on the topology (common pg views that drivers can read etc).

davecramer · 2024-03-25T11:31:45Z

topology data in the server

Yeah, looks like for the Aurora case, in order to speed up master failover it can connect to a replica instance, read the updated topology to find the new master defined there, then connect/point to new master. Seems like it would be really nice if there could be some standardisation on the topology (common pg views that drivers can read etc).

Well not every situation is the same. Will be hard to standardize. The aws wrapper drivers uses plugins to deal with this.

vlsi · 2024-03-25T13:38:07Z

An option might be to start an ExecutorService with 6 threads, test each replica in parallel? (something along the lines of testing the likely candidates in parallel)

Fair point.

Imagine the app starting with a connection pool of 50 connections.
50 connections * 6 threads == 300 concurrent requests :)

A side note: the default mode is loadBalanceHosts=false.
It might be hard to guess, however, my guess is that the majority of the users who configure several hostnames need loadBalanceHosts=true.

sehrope · 2024-03-25T13:53:42Z

This is actually one of the topics I want to bring up in Vancouver at the postgresql dev conference is to find a way to put topology data in the server.

This is a really good topic as there's only so much you can do without some server changes. And a couple small changes would go a long way.

Imagine the app starting with a connection pool of 50 connections.
50 connections * 6 threads == 300 concurrent requests :)

This is why I like stuff like this being disconnected from the driver itself. Determining which node to connect could be done once by a higher level pool and then it would re-use that information for the entire set of N connections in the pool.

Having it done by every JDBC connection on its own gets both the thundering herd situation you're describing and potentially inconsistency between how each connection makes it's determination.

This is a pretty complicated topic and I think we should focus on providing the building blocks to build solutions atop it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect server latencies and implement "prefer faster hosts" when creating connections #3178

Collect server latencies and implement "prefer faster hosts" when creating connections #3178

vlsi commented Mar 24, 2024 •

edited

davecramer commented Mar 24, 2024

rob-bygrave commented Mar 24, 2024

vlsi commented Mar 25, 2024

davecramer commented Mar 25, 2024

vlsi commented Mar 25, 2024

rbygrave commented Mar 25, 2024

davecramer commented Mar 25, 2024

rbygrave commented Mar 25, 2024

davecramer commented Mar 25, 2024

vlsi commented Mar 25, 2024

sehrope commented Mar 25, 2024

Collect server latencies and implement "prefer faster hosts" when creating connections #3178

Collect server latencies and implement "prefer faster hosts" when creating connections #3178

Comments

vlsi commented Mar 24, 2024 • edited

davecramer commented Mar 24, 2024

rob-bygrave commented Mar 24, 2024

vlsi commented Mar 25, 2024

davecramer commented Mar 25, 2024

vlsi commented Mar 25, 2024

rbygrave commented Mar 25, 2024

davecramer commented Mar 25, 2024

rbygrave commented Mar 25, 2024

davecramer commented Mar 25, 2024

vlsi commented Mar 25, 2024

sehrope commented Mar 25, 2024

vlsi commented Mar 24, 2024 •

edited