Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in multicast_observer #555

Open
mxgrey opened this issue Aug 24, 2021 · 0 comments · May be fixed by #556
Open

Deadlock in multicast_observer #555

mxgrey opened this issue Aug 24, 2021 · 0 comments · May be fixed by #556

Comments

@mxgrey
Copy link

mxgrey commented Aug 24, 2021

I've run into a deadlock that I can't seem to reproduce a minimal example of. My case appears to be a very rare race condition, and the only way I've found to reproduce it reliably is by repeatedly running a large set of convoluted unit tests (which were written for an application I'm working on) until it happens to get triggered in one of the runs. I often have to leave the tests running on repeat for 1-2 hours (that's potentially hundreds of reruns) before I see the deadlock happen. I still don't know what exact conditions need to align to cause it, but luckily I do know what the stack trace is when it happens (ordered from bottom of the stack to top of the stack):

  1. multicast_observer::add
  2. subscriber::add
  3. composite_subscription::add
  4. composite_subscription_inner::add
  5. composite_subscription_state::add
  6. subscription::unsubscribe
  7. subscription_state::unsubscribe
  8. static_subscription::unsubscribe
  9. multicast_observer::add::<lambda>

The deadlock happens because this mutex gets locked twice in this one thread (as shown in the stack trace above): [i] and [ii].

In most cases this won't happen because this whole branch is protected by the condition that the observer is subscribed, so we can usually rely on this condition to prevent frame [5] in the stack trace from being run.

The race condition appears to be that somehow between frame [1] and frame [5] another thread changes the observer's state from subscribed to unsubscribed. As I mentioned at the start I haven't figured out a way to minimally reproduce this, but assuming it's possible for another thread to change the observer to unsubscribed, it should be clear from the stack trace that what I've described is a deadlock hazard.

This race condition was happening for me on release v4.1.0, which I understand is a few years behind master, but the problematic code path seems to still exist, as the lines I linked above are from the latest master.

A very easy way to fix this problem is to change this std::mutex to a std::recursive_mutex (and of course change the template parameter on the locking mechanisms that use it). I'm happy to provide a PR to fix this, but I don't know how to make a regression test to prove the fix.

@mxgrey mxgrey linked a pull request Aug 25, 2021 that will close this issue
kirkshoop added a commit to kirkshoop/RxCpp that referenced this issue Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant