New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize HttpMetricsHandlerTests#testServerConnectionsRecorder #2370
Conversation
… by BaseHttpTest.disposeServer method.
@violetagg , please hold on before reviewing, I may have to cleanup something before. thanks. |
@violetagg, the cleanup I had in mind does not work for the moment, so can you please review ? |
…verCloseHandler in order to ensure the server has closed the client socket
…lMetricsHandler if H2 is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please base this PR on top of 1.0.x
branch
}, | ||
Function.identity()) | ||
.doOnConnection(c -> ServerCloseHandler.register(c, serverCloseLatch, isHttp11)) | ||
.metrics(true, () -> ServerRecorder.INSTANCE, Function.identity()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this to be always a new Supplier?
Closing this PR, which will be superseded by another 1.0.x upcoming PR. |
This is superseded by #2391 |
Sometimes, the HttpMetricsHandlerTests.testServerConnectionsRecorder() test fails with the following exception:
Such exception can happen if the checkServerConnectionsRecorder method throws some exceptions.
So, the PR first catches any exceptions thrown from checkServerConnectionsRecorder, and then display it from the testServerConnectionsRecorder method.
So, we can now observe this root cause exception:
So, what is failing is this check from checkServerConnectionsRecorder method:
So, when checkServerConnectionsRecorder() is called, the number of connections on the server is surprisingly 0 instead of 1 !
I think it's a concurrency issue which may happen if CI is slow. Here is the reproducer scenario I can imagine for the moment:
we are testing an HTTP2 test, and the test succeeds, so when tearDown is called, the client connection provider is disposed. But because the system is slow, the server does not immediately see the close, it will see it very shortly, in few nanoseconds
so, the next test is now executing, and then the ServerRecorder.recordServerConnectionOpened() is called: onServerConnectionsAmount is set to 1, but right after, recordServerConnectionClosed is called because the previous server test is still closing -> in this case, onServerConnectionsAmount is decremented and is wrongly set to 0
The PR is an attempt to fix this issue: it first dispose the client connection provider, then it waits for the recordServerConnectionClosed to be called, then it asserts the test expectations. This patch also simplifies a bit the test because the "done" CountDownLatch has been removed, and a new "ServerCloseHandler" is used to ensure the server has seen its client socket closed.
Now, using the patch, I restarted the CI many times and could not reproduce the problem anymore.