Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace - Communications error with new daemon #219

Open
derrickhayashi opened this issue Mar 23, 2022 · 1 comment
Open

Replace - Communications error with new daemon #219

derrickhayashi opened this issue Mar 23, 2022 · 1 comment

Comments

@derrickhayashi
Copy link

Hello, I'm running gdnsd version 3.2.1, and occasionally when reloading, replacing to the new daemon fails.
The new daemon seems to come up, but then get killed off about a second later with 'Communications error with new daemon'.

Mar 23 05:32:38 <hostname> systemd[1]: Reloading gdnsd.
Mar 23 05:32:38 <hostname> gdnsdctl[19152]: REPLACE[gdnsdctl]: Sending replace command to old daemon version 3.2.1 running at PID 622
Mar 23 05:32:38 <hostname> gdnsd[622]: REPLACE[old daemon]: Accepted replace command, spawned replacement daemon at PID 19154
Mar 23 05:32:38 <hostname> gdnsd[19154]: gdnsd version 3.2.1 @ pid 19154
Mar 23 05:32:38 <hostname> gdnsd[19154]: DNS listener threads (2 UDP + 2 TCP) configured for 127.0.0.1:5300
Mar 23 05:32:38 <hostname> gdnsd[19154]: DNS listener threads (2 UDP + 2 TCP) configured for [::1]:5300
Mar 23 05:32:38 <hostname> gdnsd[19154]: REPLACE[new daemon]: Connected to old daemon version 3.2.1 at PID 622 for takeover

[about 522 lines of plugin_geoip: map '<name>' runtime db updated.]
[about 481 lines of state of '<name>' initialized to <up/down>]

Mar 23 05:32:38 <hostname> gdnsd[622]: REPLACE[old daemon]: Communications error with new daemon at pid 19154, killing it with SIGKILL
Mar 23 05:32:39 <hostname> gdnsd[622]: REPLACE[old daemon]: New daemon at PID 19154 died, resuming normal operations
Mar 23 05:32:39 <hostname> gdnsdctl[19152]: REPLACE[gdnsdctl]: Replace command to old daemon failed
Mar 23 05:32:39 <hostname> systemd[1]: gdnsd.service: Control process exited, code=exited status=42
Mar 23 05:32:39 <hostname> systemd[1]: Reload failed for gdnsd.

This is the service configuration in systemd.

[Service]
Type = notify
NotifyAccess = all
Restart = always
RemainAfterExit = yes
ExecStart = /usr/local/sbin/gdnsd -l start
ExecStop = /usr/local/bin/gdnsdctl -l stop
ExecReload = /usr/local/bin/gdnsdctl -l replace
UMask = 0022
User = gdnsd
CapabilityBoundingSet = CAP_NET_BIND_SERVICE
AmbientCapabilities = CAP_NET_BIND_SERVICE
RuntimeDirectory = gdnsd
RuntimeDirectoryMode = 0700
OOMScoreAdjust = -900
Nice = -11
LimitMEMLOCK = infinity
NoNewPrivileges = yes
SecureBits = noroot noroot-locked no-setuid-fixup no-setuid-fixup-locked
MountFlags = slave
DevicePolicy = closed
PrivateDevices = true
PrivateTmp = true
ProtectHome = true

Looking at the CPU load average around this time, it's mostly hovering around 1 with occasional spikes of 2. This is on a 8 core/16 HT CPU so I don't suspect this to be a problem. Average number of queries we were getting was only around 10 queries/sec.

We have about 700+ records in the zone file as well as 700+ services. Service monitoring are all done though the extfile plugin using direct mode. Could the number of records/services be an issue?

Where do you think the problem lies and how else can I troubleshoot this further? Meanwhile, I'm going to try to run it in debug mode to see if I find something useful the next time it happens.

@blblack
Copy link
Member

blblack commented Jan 3, 2024

Not sure about this, but either the new daemon died (invalid new config/zones?), or there was some kind of timeout perhaps. What were the last few lines from the new daemon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants