Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer rtt unreasonably large #17837

Open
4 tasks done
freedge opened this issue Apr 22, 2024 · 0 comments
Open
4 tasks done

Peer rtt unreasonably large #17837

freedge opened this issue Apr 22, 2024 · 0 comments
Labels
area/observability area/raft priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. type/bug

Comments

@freedge
Copy link

freedge commented Apr 22, 2024

Bug report criteria

What happened?

this is a reopening of #11100

the ROUND_TRIPPER_RAFT_MESSAGE probing opens a new connection at each probe and therefore is not really computing the RTT

What did you expect to happen?

the ROUND_TRIPPER_RAFT_MESSAGE probing should happen on an existing connection.
etcd_network_peer_round_trip_time_seconds metrics should reflect the actual RTT

as per
https://etcd.io/docs/v3.5/op-guide/performance/

The RTT within a datacenter may be as long as several hundred microseconds.

this is not what is read in etcd_network_peer_round_trip_time_seconds

How can we reproduce it (as minimally and precisely as possible)?

run a cluster

tcpdump port 2380 and 'tcp[tcpflags] & tcp-syn == tcp-syn'

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 2674f94c
Go Version: go1.22.1 (Red Hat 1.22.1-1.el9)
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.6.0-alpha.0
API version: 3.6

Etcd configuration (command line flags or environment variables)

in local: ``` etcd --name infra0 --initial-advertise-peer-urls http://127.0.0.10:2380 \ --listen-peer-urls http://127.0.0.10:2380 \ --listen-client-urls http://127.0.0.10:2379,http://127.0.0.1:2379 \ --advertise-client-urls http://127.0.0.10:2379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster infra0=http://127.0.0.10:2380,infra1=http://127.0.0.11:2380,infra2=http://127.0.0.12:2380 \ --initial-cluster-state new \ --log-level debug --log-outputs stdout ```

also reproduced in OpenShift 4.14

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@jmhbnz jmhbnz added area/raft priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/observability labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability area/raft priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. type/bug
Development

No branches or pull requests

2 participants