Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout in Http1AdaptorEdge2EdgeTest.test_3000_N_client_pipeline_cancel #1236

Open
kgiusti opened this issue Sep 26, 2023 · 2 comments · Fixed by #1322
Open

Timeout in Http1AdaptorEdge2EdgeTest.test_3000_N_client_pipeline_cancel #1236

kgiusti opened this issue Sep 26, 2023 · 2 comments · Fixed by #1322
Assignees

Comments

@kgiusti
Copy link
Contributor

kgiusti commented Sep 26, 2023

Occasionally this test fails with a timeout:

2023-09-25T21:07:42.4354592Z 59: ::Http1AdaptorEdge2EdgeTest::test_3000_N_client_pipeline_cancel �[31mFAILED�[0m
2023-09-25T21:12:42.5557726Z 59: ::Http1AdaptorEdge2EdgeTest::test_3001_N_client_pipeline_recover �[31mFAILED�[0m
2023-09-25T21:17:42.9151326Z 59: ::Http1AdaptorEdge2EdgeTest::test_4000_client_half_close �[31mFAILED�[0m
2023-09-25T21:17:43.7211518Z 59: ::Http1AdaptorEdge2EdgeTest::test_4001_server_half_close �[31mFAILED�[0m
2023-09-25T21:20:40.6235926Z 37/37 Test #59: system_tests_http1_adaptor ............***Timeout 1200.10 sec

The test sends a batch of http requests to a server that does not send a reply. In a failed trace I see all these request successfully arrive at the server. The test then waits until the egress router count of undelivered/unsettled deliveries reflect the number of requests sent.

See here

I think the timeout is caused by the test getting stuck here.

I'll need to add better diagnostics to that counter query so we can see why the counters are not as expected when the test fails.

@kgiusti kgiusti self-assigned this Sep 26, 2023
@jiridanek
Copy link
Contributor

59: ::Http1AdaptorEdge2EdgeTest::test_2002_curl_post PASSED
59: ::Http1AdaptorEdge2EdgeTest::test_3000_N_client_pipeline_cancel FAILED
59: ::Http1AdaptorEdge2EdgeTest::test_3001_N_client_pipeline_recover FAILED
59: ::Http1AdaptorEdge2EdgeTest::test_4000_client_half_close FAILED
59: ::Http1AdaptorEdge2EdgeTest::test_4001_server_half_close FAILED
37/37 Test #59: system_tests_http1_adaptor ............***Timeout 1200.08 sec

https://github.com/skupperproject/skupper-router/actions/runs/7364836840/job/20045574645?pr=1361#step:34:6105

@jiridanek jiridanek linked a pull request Jan 4, 2024 that will close this issue
kgiusti added a commit to kgiusti/skupper-router that referenced this issue Feb 20, 2024
…issue

Need more insight into the test failure in order to fix this issue.
kgiusti added a commit that referenced this issue Feb 20, 2024
Need more insight into the test failure in order to fix this issue.
@kgiusti
Copy link
Contributor Author

kgiusti commented Apr 30, 2024

Update: failed again. Timeout:

2024-04-29T20:58:40.6224306Z 59: ::Http1AdaptorEdge2EdgeTest::test_3000_N_client_pipeline_cancel 
2024-04-29 20:58:40.621185 server waiting to accept connection from router
2024-04-29T20:58:40.7807593Z 59: 2024-04-29 20:58:40.780124 server connected, bringing up clients
2024-04-29T20:58:40.7827639Z 59: 2024-04-29 20:58:40.780488 Client 0 connected
2024-04-29T20:58:40.7828744Z 59: 2024-04-29 20:58:40.780603 Client 0 sent request
2024-04-29T20:58:40.7829558Z 59: 2024-04-29 20:58:40.780849 Client 1 connected
2024-04-29T20:58:40.7830436Z 59: 2024-04-29 20:58:40.780963 Client 1 sent request
2024-04-29T20:58:40.7831703Z 59: 2024-04-29 20:58:40.781207 Client 2 connected
2024-04-29T20:58:40.7832516Z 59: 2024-04-29 20:58:40.781321 Client 2 sent request
2024-04-29T20:58:40.7833338Z 59: 2024-04-29 20:58:40.781572 Client 3 connected
2024-04-29T20:58:40.7834161Z 59: 2024-04-29 20:58:40.781686 Client 3 sent request
2024-04-29T20:58:40.7834950Z 59: 2024-04-29 20:58:40.781930 Client 4 connected
2024-04-29T20:58:40.7835844Z 59: 2024-04-29 20:58:40.782018 Kill client 4 waiting for counts
2024-04-29T20:58:41.6082022Z 59: 2024-04-29 20:58:41.607332 _server_get_outstanding returning 5
2024-04-29T20:58:42.6400033Z 59: 2024-04-29 20:58:42.639347 _server_get_outstanding returning 5
2024-04-29T20:58:43.9042007Z 59: 2024-04-29 20:58:43.903323 _server_get_outstanding returning 5

The expected value of _server_get_outstanding should be 4 (requests from clients 0 thru 3). Loops forever since 4 != 5.

Seems like very occasionally the link may be counting a delivery as both unsettled and undelivered, leading to a count +1 greater than the actual outstanding deliveries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants