Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

Open
mpenick opened this issue Aug 17, 2021 · 7 comments · May be fixed by #1576
Open

Resolving DNS when connecting pool connections can lead to connection imbalances #1575

mpenick opened this issue Aug 17, 2021 · 7 comments · May be fixed by #1576

Comments

@mpenick
Copy link
Contributor

mpenick commented Aug 17, 2021

What version of Cassandra are you using?

Reproducible with any version. Tested with 3.11.10.

What version of Gocql are you using?

bc256bb

What version of Go are you using?

go1.16.6 linux/amd64

What did you do?

Create a Cluster object using a DNS name with multiple A records:

	cluster := gocql.NewCluster("somednsname.org") // Uses multiple A-records

	cluster.NumConns = 100 // Using this number to show the issue, it can happen with the default of 2

	session, err := gocql.NewSession(*cluster)
	if err != nil {
		log.Fatalf("unable to connect session: %v", err)
	}

What did you expect to see?

100 connections per host for the 3 hosts in the cluster and 1 extra for the control connection.

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
100 170.34.196.104:9042
101 178.91.75.34:9042 <-- one extra for the control connection
100 94.203.139.34:9042

What did you see instead?

Imbalanced number of connections.

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
100 170.34.196.104:9042
1 178.91.75.34:9042
200 94.203.139.34:9042
$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
2 170.34.196.104:9042
102 178.91.75.34:9042
197 94.203.139.34:9042

The problem

The host is created with the original DNS entry as the struct member hostname:

hosts = append(hosts, &HostInfo{hostname: host, connectAddress: ip, port: port})

which causes it to be re-resolved when making the connection:

gocql/conn.go

Line 247 in bc256bb

addr := host.HostnameAndPort()

using the hostname retrieved from HostInfo.HostnameAndPort():

func (h *HostInfo) HostnameAndPort() string {

The problem is that pools are mapped using ConnectAddress() from the original DNS resolved IP address (

pool, ok := p.hostConnPools[ip]
), but when re-resolved in dialer.DialContext() it can result in a different addresses because A-records don't always come back in the same order. This causes pools to contain connections to multiple addresses instead of one and results in an imbalance.

@mpenick
Copy link
Contributor Author

mpenick commented Aug 17, 2021

To reproduce. Setup a local nameserver (bind9 in my case) with the the local IPs as the A record entries:

$ cat /etc/bind/db.example.com 
;
; BIND data file for local loopback interface
;
$TTL	30
@	IN	SOA	localhost. root.localhost. (
			      2		; Serial
			     30		; Refresh
			     30		; Retry
			2419200		; Expire
			     60 )	; Negative Cache TTL
;
@	IN	NS	localhost.
@	IN	A	127.0.0.1
@	IN	A	127.0.0.2
@	IN	A	127.0.0.3

nslookup looks like this (notice the A-records changing order):

$ nslookup example.com
Server:		192.168.1.130
Address:	192.168.1.130#53

Name:	example.com
Address: 127.0.0.1
Name:	example.com
Address: 127.0.0.3
Name:	example.com
Address: 127.0.0.2

$ nslookup example.com
Server:		192.168.1.130
Address:	192.168.1.130#53

Name:	example.com
Address: 127.0.0.3
Name:	example.com
Address: 127.0.0.1
Name:	example.com
Address: 127.0.0.2

Then create a new cluster/session using example.com:

	cluster := gocql.NewCluster("example.com")

	cluster.NumConns = 100

	session, err := gocql.NewSession(*cluster)
	if err != nil {
		log.Fatalf("unable to connect session: %v", err)
	}

Note the imbalanced connection counts:

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
      1 127.0.0.1:9042
    101 127.0.0.2:9042
    199 127.0.0.3:9042

$ sudo lsof -n -i TCP:9042 | grep gocql | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
    100 127.0.0.1:9042
    198 127.0.0.2:9042
      3 127.0.0.3:9042

mpenick added a commit to mpenick/gocql that referenced this issue Aug 17, 2021
When a hostname is used for contact points it's resolved initially as
part of the initialization process then `HostInfo` objects are created
with the original `hostname` and the resolved `connectAddress`.
`hostname` is then used to dial the pool connections when causes another
DNS resolve which could result in a different IP then the original
`connectAddress` because A records can change order for each resolve.
This results in a connection pool for a given IP address containing
connections to multiple different IP addresses.

This patch removes the second resolve when dialing by setting the
`hostname` member to the resolved IP in the initialization step.

Resolves gocql#1575
@mpenick mpenick linked a pull request Aug 17, 2021 that will close this issue
@martin-sucha
Copy link
Contributor

We currently have

gocql/cluster.go

Lines 166 to 172 in bc256bb

// The supplied hosts are used to initially connect to the cluster then the rest of
// the ring will be automatically discovered. It is recommended to use the value set in
// the Cassandra config for broadcast_address or listen_address, an IP address not
// a domain name. This is because events from Cassandra will use the configured IP
// address, which is used to index connected hosts. If the domain name specified
// resolves to more than 1 IP address then the driver may connect multiple times to
// the same host, and will not mark the node being down or up from events.

in docs.

If you only want to resolve the IP addresses when creating the cluster, you can simply resolve the DNS name to IP addresses yourself and pass the list of IPs to ClusterConfig.Hosts. That's how we use it currently.

What is the desired behavior in case the DNS record changes?

@mpenick
Copy link
Contributor Author

mpenick commented Aug 19, 2021

Thanks for the pointer in the docs. I wasn't aware of that. Sorry.

What is the desired behavior in case the DNS record changes?

I'm trying to work this out myself. :) Pools are keyed based on the resolved IP (host.ConnectAddress().String()), but left unresolved when hostname is set. So you could end up with a pool containing connections to a different address other than host.ConnectAddress(). Which is a bit odd because hostname is an indirection to enable the underlying IP address(s) to change. Maybe pools should be keyed on the hostname and/or host.ConnectAddress().String() instead? Or should it use the original connected address instead of re-resolving?

I'm trying to wrap my head around a case when the driver would want unresolved hosts. Maybe in the case of a total cluster outage in an environment (like k8s) where all the hosts IPs have changed (but this would only make sense for re-establishing the control connection, not for pool connetions) or some address translator scenario?

@justinfx
Copy link

I'm looking into a similar situation when used on kubernetes where you get a headless DNS that could return A records for 3 nodes. The problem I was trying to figure out is how to roll the nodes in a cluster and ensure the client has an updated host pool. I wasn't sure if the client driver regularly resolves the hosts or not.
At first I tried a single headless dsn, which it didn't like in terms of a peer list.
Then I switched to 3 individual dns names that each resolve to a node. This fixed the peer list errors, but I still saw it end up losing all hosts after a rolling update of the cluster (pod ip's change).
Then I switched to pre-resolving the dns names to 3 ip addresses before creating the cluster config. This had the same problem after a rolling restart.

So I am wondering, is there any case where the client driver will resolve the host names again? Is there some kind of eventing that is not happening on my end when the new node pods start and the client doesn't see them? Or should I be using sticky IP addresses for the pods so they remain fixed after being rolled?

@martin-sucha
Copy link
Contributor

Okay, I've re-read the code and the original post.

So we do resolve DNS names to IP addresses when establishing the initial control connection (during session initialization) and we build a pool out of that. The imbalance in the pool is because we re-resolve the hostname when dialing. That should be fixable by dialing the IP address instead of the hostname (for TCP connection).

Dialing IP address instead of hostname might break some dialers that expect hostname (like in #1579 that might resolve through proxy, but it seems such dialer would not work anyway as we try to resolve hostnames to IP addresses first). We need to update the docs to reflect the current behaviour.

As for the rolling restart in Kubernetes, gocql receives events from the cluster about added/removed nodes. I think we should see some events from the cluster about new IP address of the host (but I'm not sure about that). Currently we keep nodes in pool by IP address. If we switch the dialer to IP address, that would not help with the k8s rolling restart case as we'd not re-resolve the hostname. @justinfx would you mind opening a separate issue with a log of events (compile with gocql_debug tag) that we get from the cluster during a rolling restart? It will be interesting to see what events we receive in that case.

I think we need a new dialer interface (that would get HostInfo pointer instead of a simple address), a place where to re-discover initial hosts (when we lose all connections), a user-specified function to discover the hosts to connect (called during session init and when we detect we lost all connections) and a way to construct HostInfo outside of gocql package. That would help with #1579 and #1487. Being able to construct HostInfo would help with testing host selection policies as well.

@justinfx
Copy link

justinfx commented Oct 4, 2021

Thanks for looking into that, @martin-sucha. I will try post a new issue with the debug output. From my tests so far, when I roll a cluster I do see events come in to the client. But the factor here is how fast you roll the cluster. If I roll them one-by-one as soon as each one passes its health-check, it seems to be too fast for the client, which ends up in a state where it thinks the entire pool is down. But if I manually roll the cluster slowly, I see the events come in for the new nodes and eventually the old down node stops logging. Unfortunately I don't think a cluster is always going to go down in that very nicely controlled fashion.

@dkropachev
Copy link

This issue is fixed by 7a6cf00,

Same reproduction flow endup with balanced connections count:

lsof -n -i :9042 | grep main | awk '{ print $9}' | cut -d">" -f2 | sort | uniq -c
    101 172.31.0.26:9042
    100 172.31.0.38:9042
    100 172.31.0.78:9042

Tested on github.com/gocql/gocql v1.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants