Resolving DNS when connecting pool connections can lead to connection imbalances #1575

mpenick · 2021-08-17T13:41:06Z

What version of Cassandra are you using?

Reproducible with any version. Tested with 3.11.10.

What version of Gocql are you using?

bc256bb

What version of Go are you using?

go1.16.6 linux/amd64

What did you do?

Create a Cluster object using a DNS name with multiple A records:

	cluster := gocql.NewCluster("somednsname.org") // Uses multiple A-records

	cluster.NumConns = 100 // Using this number to show the issue, it can happen with the default of 2

	session, err := gocql.NewSession(*cluster)
	if err != nil {
		log.Fatalf("unable to connect session: %v", err)
	}

What did you expect to see?

100 connections per host for the 3 hosts in the cluster and 1 extra for the control connection.

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
100 170.34.196.104:9042
101 178.91.75.34:9042 <-- one extra for the control connection
100 94.203.139.34:9042

What did you see instead?

Imbalanced number of connections.

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
100 170.34.196.104:9042
1 178.91.75.34:9042
200 94.203.139.34:9042

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
2 170.34.196.104:9042
102 178.91.75.34:9042
197 94.203.139.34:9042

The problem

The host is created with the original DNS entry as the struct member hostname:

gocql/control.go

Line 151 in bc256bb

    
           hosts = append(hosts, &HostInfo{hostname: host, connectAddress: ip, port: port})

which causes it to be re-resolved when making the connection:

gocql/conn.go

Line 247 in bc256bb

addr := host.HostnameAndPort()

using the hostname retrieved from HostInfo.HostnameAndPort():

gocql/host_source.go

Line 369 in bc256bb

func (h *HostInfo) HostnameAndPort() string {

The problem is that pools are mapped using ConnectAddress() from the original DNS resolved IP address (

gocql/connectionpool.go

Line 224 in bc256bb

pool, ok := p.hostConnPools[ip]

), but when re-resolved in dialer.DialContext() it can result in a different addresses because A-records don't always come back in the same order. This causes pools to contain connections to multiple addresses instead of one and results in an imbalance.

The text was updated successfully, but these errors were encountered:

mpenick · 2021-08-17T14:03:30Z

To reproduce. Setup a local nameserver (bind9 in my case) with the the local IPs as the A record entries:

$ cat /etc/bind/db.example.com 
;
; BIND data file for local loopback interface
;
$TTL	30
@	IN	SOA	localhost. root.localhost. (
			      2		; Serial
			     30		; Refresh
			     30		; Retry
			2419200		; Expire
			     60 )	; Negative Cache TTL
;
@	IN	NS	localhost.
@	IN	A	127.0.0.1
@	IN	A	127.0.0.2
@	IN	A	127.0.0.3

nslookup looks like this (notice the A-records changing order):

$ nslookup example.com
Server:		192.168.1.130
Address:	192.168.1.130#53

Name:	example.com
Address: 127.0.0.1
Name:	example.com
Address: 127.0.0.3
Name:	example.com
Address: 127.0.0.2

$ nslookup example.com
Server:		192.168.1.130
Address:	192.168.1.130#53

Name:	example.com
Address: 127.0.0.3
Name:	example.com
Address: 127.0.0.1
Name:	example.com
Address: 127.0.0.2

Then create a new cluster/session using example.com:

	cluster := gocql.NewCluster("example.com")

	cluster.NumConns = 100

	session, err := gocql.NewSession(*cluster)
	if err != nil {
		log.Fatalf("unable to connect session: %v", err)
	}

Note the imbalanced connection counts:

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
      1 127.0.0.1:9042
    101 127.0.0.2:9042
    199 127.0.0.3:9042

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
    100 127.0.0.1:9042
    198 127.0.0.2:9042
      3 127.0.0.3:9042

When a hostname is used for contact points it's resolved initially as part of the initialization process then `HostInfo` objects are created with the original `hostname` and the resolved `connectAddress`. `hostname` is then used to dial the pool connections when causes another DNS resolve which could result in a different IP then the original `connectAddress` because A records can change order for each resolve. This results in a connection pool for a given IP address containing connections to multiple different IP addresses. This patch removes the second resolve when dialing by setting the `hostname` member to the resolved IP in the initialization step. Resolves gocql#1575

martin-sucha · 2021-08-18T12:05:39Z

We currently have

gocql/cluster.go

Lines 166 to 172 in bc256bb

    
           // The supplied hosts are used to initially connect to the cluster then the rest of 
        
           // the ring will be automatically discovered. It is recommended to use the value set in 
        
           // the Cassandra config for broadcast_address or listen_address, an IP address not 
        
           // a domain name. This is because events from Cassandra will use the configured IP 
        
           // address, which is used to index connected hosts. If the domain name specified 
        
           // resolves to more than 1 IP address then the driver may connect multiple times to 
        
           // the same host, and will not mark the node being down or up from events.

in docs.

If you only want to resolve the IP addresses when creating the cluster, you can simply resolve the DNS name to IP addresses yourself and pass the list of IPs to ClusterConfig.Hosts. That's how we use it currently.

What is the desired behavior in case the DNS record changes?

mpenick · 2021-08-19T17:16:50Z

Thanks for the pointer in the docs. I wasn't aware of that. Sorry.

What is the desired behavior in case the DNS record changes?

I'm trying to work this out myself. :) Pools are keyed based on the resolved IP (host.ConnectAddress().String()), but left unresolved when hostname is set. So you could end up with a pool containing connections to a different address other than host.ConnectAddress(). Which is a bit odd because hostname is an indirection to enable the underlying IP address(s) to change. Maybe pools should be keyed on the hostname and/or host.ConnectAddress().String() instead? Or should it use the original connected address instead of re-resolving?

I'm trying to wrap my head around a case when the driver would want unresolved hosts. Maybe in the case of a total cluster outage in an environment (like k8s) where all the hosts IPs have changed (but this would only make sense for re-establishing the control connection, not for pool connetions) or some address translator scenario?

justinfx · 2021-09-29T19:44:35Z

I'm looking into a similar situation when used on kubernetes where you get a headless DNS that could return A records for 3 nodes. The problem I was trying to figure out is how to roll the nodes in a cluster and ensure the client has an updated host pool. I wasn't sure if the client driver regularly resolves the hosts or not.
At first I tried a single headless dsn, which it didn't like in terms of a peer list.
Then I switched to 3 individual dns names that each resolve to a node. This fixed the peer list errors, but I still saw it end up losing all hosts after a rolling update of the cluster (pod ip's change).
Then I switched to pre-resolving the dns names to 3 ip addresses before creating the cluster config. This had the same problem after a rolling restart.

So I am wondering, is there any case where the client driver will resolve the host names again? Is there some kind of eventing that is not happening on my end when the new node pods start and the client doesn't see them? Or should I be using sticky IP addresses for the pods so they remain fixed after being rolled?

martin-sucha · 2021-10-04T18:47:21Z

Okay, I've re-read the code and the original post.

So we do resolve DNS names to IP addresses when establishing the initial control connection (during session initialization) and we build a pool out of that. The imbalance in the pool is because we re-resolve the hostname when dialing. That should be fixable by dialing the IP address instead of the hostname (for TCP connection).

Dialing IP address instead of hostname might break some dialers that expect hostname (like in #1579 that might resolve through proxy, but it seems such dialer would not work anyway as we try to resolve hostnames to IP addresses first). We need to update the docs to reflect the current behaviour.

As for the rolling restart in Kubernetes, gocql receives events from the cluster about added/removed nodes. I think we should see some events from the cluster about new IP address of the host (but I'm not sure about that). Currently we keep nodes in pool by IP address. If we switch the dialer to IP address, that would not help with the k8s rolling restart case as we'd not re-resolve the hostname. @justinfx would you mind opening a separate issue with a log of events (compile with gocql_debug tag) that we get from the cluster during a rolling restart? It will be interesting to see what events we receive in that case.

I think we need a new dialer interface (that would get HostInfo pointer instead of a simple address), a place where to re-discover initial hosts (when we lose all connections), a user-specified function to discover the hosts to connect (called during session init and when we detect we lost all connections) and a way to construct HostInfo outside of gocql package. That would help with #1579 and #1487. Being able to construct HostInfo would help with testing host selection policies as well.

justinfx · 2021-10-04T18:53:06Z

Thanks for looking into that, @martin-sucha. I will try post a new issue with the debug output. From my tests so far, when I roll a cluster I do see events come in to the client. But the factor here is how fast you roll the cluster. If I roll them one-by-one as soon as each one passes its health-check, it seems to be too fast for the client, which ends up in a state where it thinks the entire pool is down. But if I manually roll the cluster slowly, I see the events come in for the new nodes and eventually the old down node stops logging. Unfortunately I don't think a cluster is always going to go down in that very nicely controlled fashion.

dkropachev · 2024-05-24T18:15:20Z

This issue is fixed by 7a6cf00,

Same reproduction flow endup with balanced connections count:

lsof -n -i :9042 | grep main | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
    101 172.31.0.26:9042
    100 172.31.0.38:9042
    100 172.31.0.78:9042

Tested on github.com/gocql/gocql v1.6.0

mpenick linked a pull request Aug 17, 2021 that will close this issue

Fix pool imbalance when using DNS for contact points #1576

Open

This was referenced Oct 4, 2021

Go cassandra driver (gocql) connection pool failure after Yugabyte rolling restart yugabyte/yugabyte-db#10182

Open

Rolling restart of k8s db cluster causes all nodes to remain down and not recover #1582

Open

martin-sucha mentioned this issue Dec 22, 2021

gocql does not re-resolve DNS names #831

Closed

liguozhong mentioned this issue Sep 25, 2022

[Cassandra] Loki is unable to reconnect to the Cassandra-Cluster if the IP-Adresses have changed grafana/loki#7140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

mpenick commented Aug 17, 2021

mpenick commented Aug 17, 2021

martin-sucha commented Aug 18, 2021

mpenick commented Aug 19, 2021 •

edited

justinfx commented Sep 29, 2021

martin-sucha commented Oct 4, 2021

justinfx commented Oct 4, 2021

dkropachev commented May 24, 2024

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

Comments

mpenick commented Aug 17, 2021

What version of Cassandra are you using?

What version of Gocql are you using?

What version of Go are you using?

What did you do?

What did you expect to see?

What did you see instead?

The problem

mpenick commented Aug 17, 2021

martin-sucha commented Aug 18, 2021

mpenick commented Aug 19, 2021 • edited

justinfx commented Sep 29, 2021

martin-sucha commented Oct 4, 2021

justinfx commented Oct 4, 2021

dkropachev commented May 24, 2024

mpenick commented Aug 19, 2021 •

edited