Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnifiOS UDM SE NextDNS CLI Unreliable #906

Open
zetas opened this issue Jan 25, 2024 · 20 comments
Open

UnifiOS UDM SE NextDNS CLI Unreliable #906

zetas opened this issue Jan 25, 2024 · 20 comments
Labels
🐞 bug Something isn't working

Comments

@zetas
Copy link

zetas commented Jan 25, 2024

Context

  • CLI Version: 1.41.0
  • Platform: UnifiOS/UDM SE

I've been a nextdns pro user for 1.5 years and ever since day 1 I've had issues with the CLI. I'm finally at my wits end here. I've been on UnifiOS and a UDM Pro or UDM SE the whole time. The install will go through without issue, the UI will show as working and everything is gravy. Except at some point in the next few days or weeks or months it will just silently stop working and my DNS will leak to the ISP. I'll mosey on over just to check the status on the UI only to find that I have not been sending traffic to nextdns for days. This is incredibly frustrating.
image

I am aware that installing a new version of unifiOS will stop and/or remove nextdns. I have automatic updates turned off. I do them manually and then immediately SSH in and re-install nextdns. I have tried enabling debug mode, pouring over the logs, there does not appear to be any reason for the failure.

So this brings us to now. I finally decided to setup a cron job on the UDM to run nextdns status and pipe the results to a backend service that will report success if the output is "running" and failure if it reports anything else. I have this set to run hourly and since then I've had these failures
image
I installed this cronjob early this morning and even in the few runs that it's had, it's already failed a few times. This is consistent with my spot checking throughout my time with nextdns. Sometimes I'll run "nextdns status" manually and it'll say "not running" then I run it again and it'll say "running". This is opposite of confidence. For a privacy product, this is a bit concerning.

Some sanity checks:

  1. I do not have DNS shield turned on
  2. I do not have content filtering turned on
  3. I do not have ad blocking turned on

Here's a screenshot of the security settings just for confirmation:
image
I have no custom firewall rules, only 1 network, no vlans. It's almost entirely a vanilla setup.

Here is the nextdns config:

debug false
profile AAAAAA
cache-max-age 0s
report-client-info true
detect-captive-portals false
bogus-priv true
cache-size 10MB
timeout 5s
auto-activate false
listen localhost:53
control /var/run/nextdns.sock
log-queries false
discovery-dns
mdns all
max-ttl 5s
hardened-privacy false
use-hosts true
max-inflight-requests 256
setup-router true

Is there anything I can do to enhance the reliability of this system? I love nextdns and want to keep using it but if it regularly drops and leaks requests it becomes much less useful and certainly not worth paying for :(

@zetas zetas added the 🐞 bug Something isn't working label Jan 25, 2024
@zetas
Copy link
Author

zetas commented Jan 25, 2024

Just in case you ask, here's the log output:

Jan 20 07:00:37 Midgard systemd[1]: Condition check resulted in NextDNS DNS53 to DoH proxy. being skipped.
Jan 24 20:33:52 Midgard systemd[1]: Condition check resulted in NextDNS DNS53 to DoH proxy. being skipped.
Jan 24 20:34:10 Midgard systemd[1]: Condition check resulted in NextDNS DNS53 to DoH proxy. being skipped.
Jan 24 20:34:42 Midgard systemd[1]: Started NextDNS DNS53 to DoH proxy..
Jan 24 20:34:42 Midgard nextdns[1953198]: Starting NextDNS 1.41.0/linux on localhost:5342
Jan 24 20:34:42 Midgard nextdns[1953198]: Starting mDNS discovery
Jan 24 20:34:42 Midgard nextdns[1953198]: Listening on TCP/[::1]:5342
Jan 24 20:34:42 Midgard nextdns[1953198]: Listening on UDP/127.0.0.1:5342
Jan 24 20:34:42 Midgard nextdns[1953198]: Listening on TCP/127.0.0.1:5342
Jan 24 20:34:42 Midgard nextdns[1953198]: Listening on UDP/[::1]:5342
Jan 24 20:34:47 Midgard nextdns[1953198]: Setting up ubios router
Jan 24 20:34:47 Midgard nextdns[1953198]: Connected 45.90.28.0:443 (con=15ms tls=52ms, TCP, TLS13)
Jan 24 20:34:47 Midgard nextdns[1953198]: Connected 5.161.43.197:443 (con=17ms tls=35ms, TCP, TLS13)
Jan 24 20:34:47 Midgard nextdns[1953198]: Switching endpoint: https://dns.nextdns.io#213.227.173.235,5.161.43.197,2a00:11c0:39:353::3,2a01:4ff:f0:ac5::1
Jan 24 20:35:15 Midgard nextdns[1953198]: Received signal: terminated
Jan 24 20:35:15 Midgard systemd[1]: Stopping NextDNS DNS53 to DoH proxy....
Jan 24 20:35:15 Midgard nextdns[1953198]: Stopping NextDNS 1.41.0/linux
Jan 24 20:35:15 Midgard nextdns[1953198]: Restore ubios router settings
Jan 24 20:35:15 Midgard nextdns[1953198]: NextDNS 1.41.0/linux stopped
Jan 24 20:35:15 Midgard systemd[1]: nextdns.service: Succeeded.
Jan 24 20:35:15 Midgard systemd[1]: Stopped NextDNS DNS53 to DoH proxy..
Jan 24 20:35:18 Midgard systemd[1]: Started NextDNS DNS53 to DoH proxy..
Jan 24 20:35:18 Midgard nextdns[1954782]: Starting NextDNS 1.41.0/linux on localhost:5342
Jan 24 20:35:18 Midgard nextdns[1954782]: Listening on TCP/[::1]:5342
Jan 24 20:35:18 Midgard nextdns[1954782]: Starting mDNS discovery
Jan 24 20:35:18 Midgard nextdns[1954782]: Listening on UDP/127.0.0.1:5342
Jan 24 20:35:18 Midgard nextdns[1954782]: Listening on UDP/[::1]:5342
Jan 24 20:35:18 Midgard nextdns[1954782]: Listening on TCP/127.0.0.1:5342
Jan 24 20:35:23 Midgard nextdns[1954782]: Setting up ubios router
Jan 24 20:35:25 Midgard nextdns[1954782]: Connected 45.90.28.0:443 (con=15ms tls=39ms, TCP, TLS13)
Jan 24 20:35:25 Midgard nextdns[1954782]: Connected 5.161.43.197:443 (con=17ms tls=35ms, TCP, TLS13)
Jan 24 20:35:25 Midgard nextdns[1954782]: Switching endpoint: https://dns.nextdns.io#213.227.173.235,5.161.43.197,2a00:11c0:39:353::3,2a01:4ff:f0:ac5::1
Jan 24 20:35:53 Midgard nextdns[1954782]: Control client connected: &{{0x4000290f00}}
Jan 24 20:35:53 Midgard nextdns[1954782]: Control client sent event: &{{0x4000290f00}}: discovered(<nil>)
Jan 24 20:35:53 Midgard nextdns[1954782]: Control client disconnected: &{{0x4000290f00}}
Jan 25 12:03:57 Midgard nextdns[1954782]: Control client connected: &{{0x4000116880}}
Jan 25 12:03:57 Midgard nextdns[1954782]: Control client sent event: &{{0x4000116880}}: discovered(<nil>)
Jan 25 12:03:57 Midgard nextdns[1954782]: Control client disconnected: &{{0x4000116880}}
Jan 25 12:03:59 Midgard nextdns[1954782]: Control client connected: &{{0x400007a900}}
Jan 25 12:03:59 Midgard nextdns[1954782]: Control client sent event: &{{0x400007a900}}: discovered(<nil>)
Jan 25 12:03:59 Midgard nextdns[1954782]: Control client disconnected: &{{0x400007a900}}
Jan 25 13:10:48 Midgard nextdns[1954782]: Control client connected: &{{0x4000c21700}}
Jan 25 13:10:48 Midgard nextdns[1954782]: Control client sent event: &{{0x4000c21700}}: trace(<nil>)
Jan 25 13:10:48 Midgard nextdns[1954782]: Control client disconnected: &{{0x4000c21700}}
Jan 25 13:12:57 Midgard nextdns[1954782]: Received signal: terminated
Jan 25 13:12:57 Midgard nextdns[1954782]: Stopping NextDNS 1.41.0/linux
Jan 25 13:12:57 Midgard systemd[1]: Stopping NextDNS DNS53 to DoH proxy....
Jan 25 13:12:57 Midgard nextdns[1954782]: Restore ubios router settings
Jan 25 13:12:57 Midgard nextdns[1954782]: NextDNS 1.41.0/linux stopped
Jan 25 13:12:57 Midgard nextdns[1954782]: Query 192.168.1.53 UDP A mpulse.sgsocket.com. none (qry=70/res=64) 43ms HTTP/2.0: write udp 127.0.0.1:5342->127.0.0.1:49529: use of closed network connection
Jan 25 13:12:57 Midgard systemd[1]: nextdns.service: Succeeded.
Jan 25 13:12:57 Midgard systemd[1]: Stopped NextDNS DNS53 to DoH proxy..
Jan 25 13:12:57 Midgard systemd[1]: nextdns.service: Consumed 6min 14.966s CPU time.
Jan 25 13:12:57 Midgard systemd[1]: Started NextDNS DNS53 to DoH proxy..
Jan 25 13:12:57 Midgard nextdns[2968463]: Starting NextDNS 1.41.0/linux on localhost:5342
Jan 25 13:12:57 Midgard nextdns[2968463]: Starting mDNS discovery
Jan 25 13:12:57 Midgard nextdns[2968463]: Listening on TCP/[::1]:5342
Jan 25 13:12:57 Midgard nextdns[2968463]: Listening on TCP/127.0.0.1:5342
Jan 25 13:12:57 Midgard nextdns[2968463]: Listening on UDP/127.0.0.1:5342
Jan 25 13:12:57 Midgard nextdns[2968463]: Listening on UDP/[::1]:5342
Jan 25 13:13:02 Midgard nextdns[2968463]: Setting up ubios router
Jan 25 13:13:03 Midgard nextdns[2968463]: Connected 45.90.28.0:443 (con=15ms tls=44ms, TCP, TLS13)
Jan 25 13:13:03 Midgard nextdns[2968463]: Connected 5.161.43.197:443 (con=17ms tls=35ms, TCP, TLS13)
Jan 25 13:13:03 Midgard nextdns[2968463]: Switching endpoint: https://dns.nextdns.io#213.227.173.235,5.161.43.197,2a00:11c0:39:353::3,2a01:4ff:f0:ac5::1
Jan 25 17:13:07 Midgard nextdns[2968463]: Connected 45.90.28.0:443 (con=17ms tls=36ms, TCP, TLS13)
Jan 25 17:13:07 Midgard nextdns[2968463]: Switching endpoint: https://dns1.nextdns.io#45.90.28.0,2a07:a8c0::
Jan 25 19:13:07 Midgard nextdns[2968463]: Connected 5.161.43.197:443 (con=17ms tls=34ms, TCP, TLS13)
Jan 25 19:13:07 Midgard nextdns[2968463]: Switching endpoint: https://dns.nextdns.io#213.227.173.235,5.161.43.197,2a00:11c0:39:353::3,2a01:4ff:f0:ac5::1

None of this is me starting or stopping the daemon btw

@rs
Copy link
Contributor

rs commented Jan 26, 2024

Something seems to be sending the deamon a TERM signal. Any clue what that could be? Are you running any other custom software on the router?

@zetas
Copy link
Author

zetas commented Jan 26, 2024

I have no idea. I am not running anything else on the router.

@zetas
Copy link
Author

zetas commented Jan 26, 2024

Will this work just as well as the custom CLI?
image

@rs
Copy link
Contributor

rs commented Jan 27, 2024

No it won’t. You won’t gave any blocking nor profile and none of cli features.

@zetas
Copy link
Author

zetas commented Jan 27, 2024

That is what I figured, bummer.

I am also a software dev, though not as skilled in golang. If there's anything I can do to help debug this issue I'll do it. I'm considering doing a full wipe on the UDM back to factory this weekend just to see if that helps. It hasn't been wiped in probably 5 years.

@rs
Copy link
Contributor

rs commented Jan 27, 2024

Do you seen anything in the system logs around the nextdns log line setting it is receiving a term signal?

@zetas
Copy link
Author

zetas commented Jan 27, 2024

Not that I can tell but I do see tons of these entries all over the /var/log/daemon.log:

2024-01-27T10:40:59-05:00 Midgard nextdns[2968463]: Query 192.168.1.179 UDP A msh.amazon.com. none (qry=65/res=32) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T11:55:44-05:00 Midgard nextdns[2968463]: Query 192.168.1.100 UDP PTR 8.8.8.8.in-addr.arpa. none (qry=71/res=38) 5000ms : doh resolve: context deadline exceeded
2024-01-27T11:55:45-05:00 Midgard nextdns[2968463]: Query 192.168.1.100 UDP PTR 8.8.8.8.in-addr.arpa. none (qry=71/res=38) 5000ms : doh resolve: context deadline exceeded
2024-01-27T12:12:09-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP 65 gspe79-ssl.ls.apple.com. none (qry=74/res=41) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:09-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP A gateway.icloud.com. none (qry=69/res=36) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:10-05:00 Midgard nextdns[2968463]: Query 192.168.1.161 UDP 65 doh.dns.apple.com. none (qry=68/res=35) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:10-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP 65 gspe79-ssl.ls.apple.com. none (qry=74/res=41) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:10-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP A gateway.icloud.com. none (qry=69/res=36) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:11-05:00 Midgard nextdns[2968463]: Query 192.168.1.161 UDP 65 doh.dns.apple.com. none (qry=68/res=35) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:11-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP 65 api2.amplitude.com. none (qry=69/res=36) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:11-05:00 Midgard nextdns[2968463]: Query 192.168.1.161 UDP A aw.sc-cdn.net. none (qry=64/res=31) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:11-05:00 Midgard nextdns[2968463]: Query 192.168.1.100 UDP PTR 8.8.8.8.in-addr.arpa. none (qry=71/res=38) 5000ms : doh resolve: context deadline exceeded
2024-01-27T12:12:11-05:00 Midgard nextdns[2968463]: Query 192.168.1.163 UDP A artworks.thetvdb.com. none (qry=71/res=38) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:12-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP 65 api2.amplitude.com. none (qry=69/res=36) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:12-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP 65 gspe79-ssl.ls.apple.com. none (qry=74/res=41) cache fallback HTTP/2.0: doh resolve: context deadline exceeded
2024-01-27T12:12:12-05:00 Midgard nextdns[2968463]: Query 192.168.1.48 UDP A gateway.icloud.com. none (qry=69/res=36) cache fallback HTTP/2.0: doh resolve: context deadline exceeded

This is after I turned debug mode on btw in the config

@zetas
Copy link
Author

zetas commented Feb 3, 2024

Last weekend I completely wiped the UDM SE back to factory and re-set it up with generic settings, just enough to get everything working. I setup nextdns again and I'm seeing the same behavior. If there's no solution to this I can just swap over to the DNS Shield and forgo lan reporting, profiles, and blocking. I mainly use the service for DoH :/

@Amazo00
Copy link

Amazo00 commented Feb 6, 2024

I updated my UDM SE TO 3.2.12 and saw my NextDNS stop working also. No matter how I tried I couldn’t get it to come back via the command line.

my ISP uses PPPoE, so I was running the UDM SE in double NAT via a DMZ. When I switched back to having the UDM SE do the PPPoE session it still didn’t work.

My ISP has dual stack IPv6 implemented, so I enabled that at the WAN settings and run the IPv6 profile as indicated on the setup page at my.nextdns for each VLAN as a DHCP handout and it’s been a working work-around, mostly.

@supernovae
Copy link

same errors as @zetas - I restored to prior config, re-installed and it re-broke again so i re-re-restored and didn't apply nextdns config but instead switched to the built in DoH and ad blocking.

I'd still rather have nextdns, but something is broken. I too run dual stack.

I'd be fine running dns shield too if i could just associate my ID

@zetas
Copy link
Author

zetas commented Feb 10, 2024

@supernovae i think you can if you setup Nextdns with your public ip. Then it will associate requests from it with your profile.

@eriknh
Copy link

eriknh commented Mar 10, 2024

Hi there! Just wanted to report that I am seeing NextDNS CLI edition absolutely hammer my routers CPU to the point where the router stops responding to other traffic.

My sincere apologies for not having logs because I had to uninstall the NextDNS CLI to get back online. I am seeing the same "DoH Resolve: Context deadline exceeded" in the configuration output text box in LuCI.

I had this happen several times over the last week or so.

OpenWrt 23.05.2 r23630-842932a63d / LuCI openwrt-23.05 branch git-24.048.61449-c66fc92
Ubiquiti EdgeRouter X
NextDNS CLI 1.42.0-1

I used the default config settings supplied with the NextDNS CLI.

EDIT/UPDATE: I think this race condition occurred potentially because of an ISP issue combined with my own config. I was forcing all DNS calls on the LAN segment to hit the router interface no matter the internal origin; additionally, the IPv6 service from my ISP became unstable/non-responsive which lead to a race condition on ipv6 arpa reverse lookups. My ISP uses a 6RD tunnel for IPv6 which I am finding is not stable during high traffic hours of the day/weekend.

@BudgieBird
Copy link

Seeing the same issues re: CPU and had t uninstall NextDNS to get back online after several bumps offline.
UDM SE: 3.2.12
NextDNS CLI 1.42.0-1

All CPU spike times coincide with drops, which my PDU restarted. I've just uninstalled the CLI and set up a crontab curl to auto-update my IP. No device names in the logs, sadly, but it's better than nothing.

CleanShot 2024-04-17 at 19 52 41@2x

@rs
Copy link
Contributor

rs commented Apr 28, 2024

If you turn on query logging, do you see anything particular in the nextdns log when this happens?

@bondskin
Copy link

Re CPU load, this seems to be also my issue (#925 (comment)).
As @rs mentioned also to me, I'll enable the query logging

@wmorrel
Copy link

wmorrel commented May 18, 2024

I’m having the same issues. It works intermittently, but the status will occasionally drop out, several times a minute, and it lets ads through when the DNS falls back to CloudFlare.

@rs
Copy link
Contributor

rs commented May 19, 2024

@wmorrel anything interesting in the logs?

@wmorrel
Copy link

wmorrel commented May 19, 2024

@rs Happy to check. What sort of log settings do I need to adjust and where do I check the output (/var/log/daemon.log ?)?

@rs
Copy link
Contributor

rs commented May 20, 2024

You can check the output of nextdns log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants