-
-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keepalived_healthcheckers[62205]: A thread timer expired 5.167032 seconds ago #2271
Comments
I notice that CPUs 25, 26 and 27 are running at 100% and also the hardware interrupt rate is not insignificant. Could the processes running on CPUs 25, 26 and 27 be using some other (i.e. non memory/CPU) system resources that stop keepalived being scheduled? Is /tmp local, and are the keepalived logs being written to local filestore? We have experienced similar delays in the past when keepalived is reading/writing files that are on network attached devices. The thread timer expired times are all 5.16... seconds, which is extremely strange. Can you capture the network traffic for the TCP_CHECKs? It might give some indication as to why they are running a 7 second intervals, although the thread timer expired messages is likely to be the reason why the checkers are running at 7 or 8 second intervals. With the information provided above I cannot see any other reason for why the kernel is not scheduling keepalived to run as needed. |
Thank you for your reply. In fact, we have tried setting CPU affinity and binding Keepalived to an idle CPU core. However, we still encountered thread timer expired logs. We also do not have network storage. The attachment is the TCP check packet that I captured. |
hi @pqarmitage ,I enable the epoll_debug flag. And I noticed that all "A thread timer" are generated when calling epoll_wait on fd 1. Apr 3 09:45:41 localhost Keepalived_healthcheckers[73631]: Setting timer_fd 0.000000001 |
@qd19zzx Are you still experiencing this problem? The only way I can think of to debug this is to run with Run keepalived with the following options: |
Describe the issue
We're using keepalived to perform health checks on the realserver. We've noticed that the interval between TCP health checks is inconsistent with the deloy_loop setting. Although deloy_loop is set to 3 seconds, the actual interval is 7 seconds. Additionally, we've seen a large number of 'Keepalived_healthcheckers[62205]: A thread timer expired 5.167316 seconds ago' in our logs, I think the interval inconsistent maybe related to this. And the comment in the code suggests that it may be related to high load, but we have sufficient CPU and memory. What other factors could cause A thread timer expired and lead to inaccurate interval times?
top - 15:23:10 up 2 days, 1:44, 3 users, load average: 5.00, 4.92, 4.56
Tasks: 809 total, 1 running, 808 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu8 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 1.3 us, 2.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu25 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 99.7 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu27 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu28 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu29 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu30 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu31 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
[root@localhost l4lb-agent]# free -h
total used free shared buff/cache available
Mem: 472Gi 237Gi 230Gi 244Mi 4.3Gi 231Gi
Swap: 4.0Gi 0.0Ki 4.0Gi
Keepalived version
Output of
keepalived -v
Keepalived v2.2.7 (03/21,2023), git commit +
Copyright(C) 2001-2023 Alexandre Cassen, acassen@gmail.com
Built with kernel headers for Linux 4.18.0
Running on Linux 4.18.0-240.el8.x86_64 #1 SMP Mon Jan 4 11:55:39 UTC 2021
Distro: CentOS Linux 8
configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exusr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-vrrp PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXXsr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection LDFLAGS=-Wl,-z,relro
Config options: LVS OLD_CHKSUM_COMPAT INIT=systemd SYSTEMD_NOTIFY
System options: VSYSLOG MEMFD_CREATE LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CFRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_C
Distro (please complete the following information):
Configuration file:
A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses
global_defs {
lvs_notify_fifo /tmp/lvs_rs_update
checker_process_name lvs_checker
}
virtual_server 177.17.7.3 4040 {
delay_loop 6
persistence_timeout 50
protocol tcp
real_server 177.17.6.5 4045 {
weight 100
inhibit_on_failure
TCP_CHECK {
connect_timeout 10
retry 3
delay_before_retry 3
delay_loop 3
connect_port 4041
log_all_failures on
}
}
real_server 177.17.6.4 4044 {
weight 100
inhibit_on_failure
TCP_CHECK {
connect_timeout 10
retry 3
delay_before_retry 3
delay_loop 3
connect_port 4041
log_all_failures on
}
}
real_server 177.17.6.6 4046 {
weight 100
inhibit_on_failure
TCP_CHECK {
connect_timeout 10
retry 3
delay_before_retry 3
delay_loop 3
connect_port 4041
log_all_failures on
}
}
}
System Log entries
Full keepalived system log entries from when keepalived started
Apr 02 15:21:11 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.162731 seconds ago
Apr 02 15:21:18 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.165175 seconds ago
Apr 02 15:21:25 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.166563 seconds ago
Apr 02 15:21:32 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.163723 seconds ago
Apr 02 15:21:39 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.165628 seconds ago
Apr 02 15:21:47 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.167099 seconds ago
Apr 02 15:21:54 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.167093 seconds ago
Apr 02 15:22:01 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.165756 seconds ago
Apr 02 15:22:08 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.167222 seconds ago
Apr 02 15:22:15 localhost.localdomain Keepalived_healthcheckers[62205]: A thread timer expired 5.166807 seconds ago
The text was updated successfully, but these errors were encountered: