-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in BasicPullResponseHandler from neo4j-java-driver when used in a reactive way while cancelling unfinished Reactor subscription #1230
Comments
To work around this problem, one can instruct the reactive stream to cancel on a different thread. But that needs to be done after the particular thread executes
note that this wouldn't work as the
but note that this work around works only thanks to the fact that other implementations of the |
Thank you for detailed explanation of the problem. We will see if we can improve it. In the meantime, have you had a chance to try the following update? reactor/reactor-core#3053 |
@injectives , yep that Simon's improvement prevents the deadlock I reported. |
Hi, I'm sorry for the late response. |
We detected a deadlock in our production setup between 2 threads, each locking same locks but in the opposite order.
See the 2 callstacks from a thread dump we collected:
Here you can see that
Neo4jDriverIO-2-3
is emitting data and while executing the on next chain it first locksBasicPullResponseHandler
monitor in the methodonRecord()
and later it tries to lock theMonoCollectListSubscriber
monitor in the methodonNext()
default-nioEventLoopGroup-1-2
which is running on an external event (in this case it's netty publishing a channel close which in turn makes Micronaut to cancel the reactive chain) and locks theMonoCollectListSubscriber
while runningcancel()
and tries to lockBasicPullResponseHandler
in thecancel()
methodthe threads happen to end up in a deadlock.
Steps to reproduce
put a breakpoint into
reactor.core.publisher.MonoCollectList.MonoCollectListSubscriber#onNext(java.lang.Object)
and set it to suspend only the current threadRun the test in a debug mode
When you hit that breakpoint wait at least for 6 seconds (until the
main
method tries to cancel the subscription via theblock(Duration.ofSeconds(5))
Resume
The test should never complete because the
main
thread and theNeo4jDriver
thread got into a deadlockcollect the thread dump
This reproducer is just a demonstration of how one can reproduce the problem completely synthetically. By using the breakpoint you're causing the threads to interleave in the exact wrong order and ending in a deadlock.
We hit this deadlock while running a standard business logic in our production on AWS. (No breakpoints needed :) )
Expected behavior
The neo4j
BasicPullResponseHandler
should not get into a deadlock with Reactor primitives.The text was updated successfully, but these errors were encountered: