Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing, core: fix deadlock in register_callsite #2020

Closed
wants to merge 6 commits into from

Commits on Mar 24, 2022

  1. tracing: test reproducing register_callsite deadlock

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>
    hawkw committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    890acc0 View commit details
    Browse the repository at this point in the history
  2. core: remove mutex poisoning

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>
    hawkw committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    37c664a View commit details
    Browse the repository at this point in the history
  3. core: add callsite::try_register

    ## Motivation
    
    Currently, the `callsite::register` function will unconditionally
    attempt to lock the global callsite registry. This is unfortunate,
    because it can result in a deadlock if a subscriber's
    `register_callsite` method calls into code that itself contains
    `tracing` instrumentation with callsites that haven't yet been
    registered.
    
    ## Solution
    
    This commit adds a new `callsite::try_register` function that *attempts*
    to register a callsite, returning `true` if it was successfully
    registered. If the current thread is registering a callsite (or, on
    `no_std`, if a callsite is being registered at all), it will return
    `false`, indicating that the registration must be retried.
    
    Signed-off-by: Eliza Weisman <eliza@buoyant.io>
    hawkw committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    8addd04 View commit details
    Browse the repository at this point in the history
  4. tracing: fix deadlock when registering callsites

    ## Motivation
    
    Currently, there is a potential deadlock in `register_callsite` that can
    occur if registering a callsite causes _another_ callsite to be hit for
    the first time. If this occurs, the thread will try to lock the callsite
    registry while it is already holding the lock, resulting in a deadlock.
    
    This is unfortunate, since it can occur if a subscriber's
    `register_callsite` calls any function that contains a tracing event, if
    that event has not previously been registered. This has occurred in e.g.
    `tokio-console` due to using a tokio channel in the subscriber.
    
    ## Solution
    
    This commit fixes the deadlock by using the `callsite::try_register`
    function added to `tracing-core`, which uses a thread-local to bail if
    the *current* thread is already registering a callsite. In order to do
    this, we replace the `std::sync::Once` in `MacroCallsite` with a
    hand-written atomic once-like loop. This is because, while we only want
    to register the callsite once, we may have to *try* to register it more
    than one time before `try_register` returns true. Now, we set a flag
    only if the callsite has been successfully registered.
    
    Signed-off-by: Eliza Weisman <eliza@buoyant.io>
    hawkw committed Mar 24, 2022
    Configuration menu
    Copy the full SHA
    42ab66c View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2022

  1. Configuration menu
    Copy the full SHA
    5be3591 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2022

  1. Configuration menu
    Copy the full SHA
    c6ffc6e View commit details
    Browse the repository at this point in the history