Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional leak of Thread/Channel instances and threads blocking indefinately #1194

Open
a701440 opened this issue Dec 13, 2023 · 6 comments
Labels

Comments

@a701440
Copy link

a701440 commented Dec 13, 2023

Describe the bug

We noticed that occasionally we have threads stuck forever with the sack that point to the RabbitMQ channel class.
The stack is below. Generally there are several notes about this:

  1. The the RPC timeout is not applied to the enqueueAsyncRpc part of the processing only to the IO part
    This makes it possible for the thread to wait forever for _activeRpc to clear which in this case never happens.
    i.e. the code below can loop forever disregarding the rpc timeout.

         while(this._activeRpc != null) {
             try {
                 this._channelLockCondition.await();
             } catch (InterruptedException var7) {
                 var2 = true;
             }
         }
    

    I guess the same rpc timeout note goes for any "lock" calls
    this._channelLock.lock()
    A thread can also get stuck there forever.

  2. The code was recently refactored to use locks instead of the synchronized sections
    It's not clear how that should work in the multi-threaded environment (java memory model) with members like _activeRpc.
    They are not declared as volatile/atomic. How would memory barrier be enforced for multi-threaded access to the
    member?

  3. Also the semantic of the at ChannelN.asyncCompletableRpc is async, so it should never block,
    but in fact it can and does block.

========================== Threads get stuck in this state indefinitely ================

java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)

  • parking to wait for <0x00000007a5c699d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/LockSupport.java:341)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.9/AbstractQueuedSynchronizer.java:506)
    at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.9/ForkJoinPool.java:3465)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.9/ForkJoinPool.java:3436)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.9/AbstractQueuedSynchronizer.java:1623)
    at com.rabbitmq.client.impl.AMQChannel.doEnqueueRpc
    at com.rabbitmq.client.impl.AMQChannel.enqueueAsyncRpc
    at com.rabbitmq.client.impl.AMQChannel.quiescingAsyncRpc
    at com.rabbitmq.client.impl.AMQChannel.asyncRpc
    at com.rabbitmq.client.impl.AMQChannel.privateAsyncRpc
    at com.rabbitmq.client.impl.AMQChannel.exnWrappingAsyncRpc
    at com.rabbitmq.client.impl.ChannelN.asyncCompletableRpc
    at reactor.rabbitmq.Sender.lambda$declareExchange$16
    ...

Reproduction steps

It's not clear what is triggering the condition.

Expected behavior

Any invocation should respect the rpc timeout.

Additional context

The code is using 5.20.0 version of the java client.

@a701440 a701440 added the bug label Dec 13, 2023
@a701440
Copy link
Author

a701440 commented Dec 13, 2023

Actually ReentrantLock provides the memory barrier semantics, so #2 should be OK then.

@michaelklishin
Copy link
Member

michaelklishin commented Dec 13, 2023

@a701440 the answer is very straightforward: channels are not meant to be shared between threads. Never for publishing, virtually never for synchronous operations that modify the topology (declaring queues, for example). Having multiple consumers on a shared channel can work fine if your consumers coordinate.

Now, _activeRpc and friends are not used on the publishing path so I wouldn't object to making access to it safer. But in general, it sounds like you may be doing something that is explicitly prohibited in RabbitMQ clients, by design.

@a701440
Copy link
Author

a701440 commented Dec 15, 2023

If Channel(s) in the Java can not be used concurrently this should be really defined much better in the API documentation. In that case the issue moves to Reactor Rabbit library and it's Sender class. It has resource management channel mono, but does not seem to close the resource management channel in any way, so the implementation "implies" that the same channel should be re-used for all resource management operations. I assume that Sender/Receiver classes there are actually thread safe and should be shared.

@michaelklishin
Copy link
Member

@a701440 where specifically do you suggest that put an extra warning? On the Channel#basicPublish method? You know too well users usually do not read documentation before diving head-first into writing code.

@michaelklishin
Copy link
Member

@a701440 I am also not against making the (internal) RPC part safer since it should not affect the hot code paths of publishing and consumptions.

Would you like to contribute a version that is safer, e.g. using the 2nd option listed above, a ReentrantLock?

@a701440
Copy link
Author

a701440 commented Dec 18, 2023

I'll have to find out what the company policy is regarding contributing code to the open source projects. I think I need to get an explicit permission. Generally, even if the Channel is not thread safe it would be nice to prevent the indefinite thread lock-up with the stack above using the provided rpc timeout value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants