Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS name resolution lookup misses can block the router timer by several seconds. #1451

Open
kgiusti opened this issue Mar 22, 2024 · 1 comment
Assignees
Labels
blocked Cannot resolve due to external factor (see comments)

Comments

@kgiusti
Copy link
Contributor

kgiusti commented Mar 22, 2024

Attempting to establish a TCP connection can hang the router timer for several seconds if the DNS resolution fails.

Reproducer: run the system_tests_handle_failover test, then "grep" router 'A.log' for the "process_tick" log message. These should occur once a second. Check the timestamp associated with the log message. You'll notice that the log events do not occur on one-second intervals as expected.

Example:

$ ctest -V -R system_tests_handle_failover
...
$ grep "process_tick"  ./tests/system_test.dir/system_tests_handle_failover/FailoverTest/setUpClass/A.log
...
2024-03-22 10:39:50.171629 -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 10:39:51.172545 -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 10:39:52.172506 -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 **10:40:04.417526** -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 **10:40:12.169492** -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 **10:40:16.169809** -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)
2024-03-22 10:40:17.170732 -0400 ROUTER_CORE (debug) Core action 'process_tick' (/home/kgiusti/work/skupper/skupper-router/src/router_core/router_core_thread.c:253)

This could delay other timer events to the point where instability of the router can occur.

@kgiusti kgiusti self-assigned this Mar 22, 2024
@kgiusti
Copy link
Contributor Author

kgiusti commented Apr 11, 2024

The following proton patch appears to fix this issue:
https://issues.apache.org/jira/browse/PROTON-2812

I'm going to mark this as blocked on that above jira.

@kgiusti kgiusti added the blocked Cannot resolve due to external factor (see comments) label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Cannot resolve due to external factor (see comments)
Projects
None yet
Development

No branches or pull requests

1 participant