Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add a limited form of the linked-list callsite registry for v0.1.x #2083

Merged
merged 14 commits into from Apr 22, 2022

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Apr 19, 2022

Motivation

Currently on v0.1.x, the global callsite registry is implemented as a
Mutex<Vec<&'static dyn Callsite>>. This has a few downsides:

  • Every time a callsite is registered, the Mutex must be locked. This
    can cause a deadlock when a register_callsite implementation calls
    into code that also registers a callsite. See tracing, core: fix deadlock in register_callsite #2020 for details.

  • Registering callsites is relatively slow and forces the program to
    synchronize on the callsite registry (and not on individual
    callsites). This means that if two threads are both registering
    totally different callsites, both threads must wait for the lock.
    Although this overhead is amortized over the entire rest of the
    program, it does have an impact in short-running applications where
    any given callsite may only be hit once or twice.

  • Memory allocation may occur while the registry is locked. This makes
    the use of tracing inside of memory allocators difficult or
    impossible.

On the master branch (v0.2.x), PR #988 rewrote the callsite registry
to use an intrusive atomic singly-linked list of Callsites. This
removed the need for locking the callsite registry, and made it possible
to register callsites without ever allocating memory. Because the
callsite registry on v0.2 will never allocate, this also makes it
possible to compile tracing-core for no_std platforms without
requiring liballoc. Unfortunately, however, we cannot use an identical
implementation on v0.1.x, because using the intrusive linked-list
registry for all callsites requires a breaking change to the
Callsite trait (in order to add a way to get the callsite's
linked-list node), which we cannot make on v0.1.x.

Solution

This branch adds a linked-list callsite registry for v0.1.x in a way
that allows us to benefit from some of the advantages of this approach
in a majority of cases. The trick is introducing a new DefaultCallsite
type in tracing-core that implements the Callsite trait. This type
can contain an intrusive list node, and when a callsite is a
DefaultCallsite
, we can register it without having to push it to the
Mutex<Vec<...>>. The locked vec still exists, for Callsites that
are not DefaultCallsites, so we can't remove the liballoc
dependency, but in most cases, we can avoid the mutex and allocation.

Naturally, tracing has been updated to use the DefaultCallsite type
from tracing-core, so the Vec will only be used in the following
cases:

  • User code has a custom implementation of the Callsite trait, which
    is not terribly common.
  • An older version of the tracing crate is in use.

This fixes the deadlock described in #2020 when DefaultCallsites are
used. Additionally, it should reduce the performance impact and memory
usage of the callsite registry.

Furthermore, I've changed the subscriber registry so that a
RwLock<Vec<dyn Dispatch>> is only used when there actually are
multiple subscribers in use. When there's only a global default
subscriber, we can once again avoid locking for the registry of
subscribers. When the std feature is disabled, thread-local scoped
dispatchers are unavailable, so the single global subscriber will always
be used on no_std.

We can make additional changes, such as the ones from #2020, to also
resolve potential deadlocks when non-default callsites are in use, but
since this branch rewrites a lot of the callsite registry code, that
should probably be done in a follow-up.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This way, `callsite::register` also does the right thing

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added kind/feature New feature or request crate/core Related to the `tracing-core` crate kind/perf Performance improvements labels Apr 19, 2022
@hawkw hawkw requested a review from a team as a code owner April 19, 2022 21:24
@hawkw hawkw self-assigned this Apr 19, 2022
@hawkw
Copy link
Member Author

hawkw commented Apr 19, 2022

@LucioFranco I tagged you on this since you wrote the original linked-list registry PR #988 ages ago, hope that was okay!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw force-pushed the eliza/linked-list-registry branch from 4a30546 to 18cc711 Compare April 20, 2022 17:07
Copy link
Member

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM only have some questions but nothing blocking! This seems like a nice work around!

debug_assert_eq!(
_state,
Self::REGISTERING,
"weird callsite registration state"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol do we want a better message here?

Comment on lines +234 to +238
let interest = match () {
_ if interest.is_never() => Self::INTEREST_NEVER,
_ if interest.is_always() => Self::INTEREST_ALWAYS,
_ => Self::INTEREST_SOMETIMES,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what is this, does this just replace an if/else if/else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's a couple lines shorter...this was in code that was already there in tracing and was moved to this file. i could change it to an if if you think that's clearer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah it just took me a min to figure out what was going on lol. I don't have any strong opinions.

let this_interest = dispatch.register_callsite(meta);
interest = match interest.take() {
None => Some(this_interest),
Some(that_interest) => Some(that_interest.and(this_interest)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that_interest.and(this_interest)" this is poetic

@hawkw
Copy link
Member Author

hawkw commented Apr 21, 2022

@davidbarsky were you planning to look at this one as well?

@davidbarsky
Copy link
Member

@hawkw not really, but i'll give it a cursory review and probably stamp it.

@hawkw hawkw enabled auto-merge (squash) April 22, 2022 16:10
@hawkw hawkw merged commit 45a5df1 into v0.1.x Apr 22, 2022
@hawkw hawkw deleted the eliza/linked-list-registry branch April 22, 2022 16:11
hawkw added a commit that referenced this pull request Jun 7, 2022
# 0.1.27 (June 7, 2022)

This release of `tracing-core` introduces a new `DefaultCallsite` type,
which can be used by instrumentation crates rather than implementing
their own callsite types. Using `DefaultCallsite` may offer reduced
overhead from callsite registration.

### Added

- `DefaultCallsite`, a pre-written `Callsite` implementation for use in
  instrumentation crates ([#2083])
- `ValueSet::len` and `Record::len` methods returning the number of
  fields in a `ValueSet` or `Record` ([#2152])

### Changed

- Replaced `lazy_static` dependency with `once_cell` ([#2147])

### Documented

- Added documentation to the `callsite` module ([#2088], [#2149])

Thanks to new contributors @jamesmunns and @james7132 for contributing
to this release!

[#2083]: #2083
[#2152]: #2152
[#2147]: #2147
[#2088]: #2088
[#2149]: #2149
hawkw added a commit that referenced this pull request Jun 7, 2022
# 0.1.27 (June 7, 2022)

This release of `tracing-core` introduces a new `DefaultCallsite` type,
which can be used by instrumentation crates rather than implementing
their own callsite types. Using `DefaultCallsite` may offer reduced
overhead from callsite registration.

### Added

- `DefaultCallsite`, a pre-written `Callsite` implementation for use in
  instrumentation crates ([#2083])
- `ValueSet::len` and `Record::len` methods returning the number of
  fields in a `ValueSet` or `Record` ([#2152])

### Changed

- Replaced `lazy_static` dependency with `once_cell` ([#2147])

### Documented

- Added documentation to the `callsite` module ([#2088], [#2149])

Thanks to new contributors @jamesmunns and @james7132 for contributing
to this release!

[#2083]: #2083
[#2152]: #2152
[#2147]: #2147
[#2088]: #2088
[#2149]: #2149
hawkw added a commit that referenced this pull request Jun 7, 2022
# 0.1.35 (June 7, 2022)

This release reduces the overhead of callsite registration by using new
`tracing-core` APIs.

### Added

- Use `DefaultCallsite` to reduce callsite registration overhead
  ([#2083])

### Changed

- `tracing-core`: updated to [0.1.27][core-0.1.27]

[core-0.1.27]: https://github.com/tokio-rs/tracing/releases/tag/tracing-core-0.1.27
[#2088]: #2083
hawkw added a commit that referenced this pull request Jun 8, 2022
# 0.1.35 (June 8, 2022)

This release reduces the overhead of callsite registration by using new
`tracing-core` APIs.

### Added

- Use `DefaultCallsite` to reduce callsite registration overhead
  ([#2083])

### Changed

- `tracing-core`: updated to [0.1.27][core-0.1.27]

[core-0.1.27]: https://github.com/tokio-rs/tracing/releases/tag/tracing-core-0.1.27
[#2088]: #2083
bkchr added a commit to bkchr/tracing that referenced this pull request Jun 17, 2022
When the interest cache is configured as part of initing the `LogTracer` before
the global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
tokio-rs#2083 the callsite registration
changed to use `dispatcher::get_default`. This call internally sets the
thread local `default` dispatcher to `Dispatch::none` when there isn't
yet any dispatcher set. This `default` dispatcher isn't overwritten when
setting the global dispatcher. The solution is to not store the
`Dispatch::none` as `default` dispatcher.
bkchr added a commit to bkchr/tracing that referenced this pull request Jun 17, 2022
When the interest cache is configured as part of initing the `LogTracer` before
the global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
tokio-rs#2083 the callsite registration
changed to use `dispatcher::get_default`. This call internally sets the
thread local `default` dispatcher to `Dispatch::none` when there isn't
yet any dispatcher set. This `default` dispatcher isn't overwritten when
setting the global dispatcher. The solution is to not store the
`Dispatch::none` as `default` dispatcher.
hawkw pushed a commit that referenced this pull request Jun 22, 2022
When the interest cache is configured as part of initing the `LogTracer` before
the global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
#2083 the callsite registration
changed to use `dispatcher::get_default`. This call internally sets the
thread local `default` dispatcher to `Dispatch::none` when there isn't
yet any dispatcher set. This `default` dispatcher isn't overwritten when
setting the global dispatcher. The solution is to not store the
`Dispatch::none` as `default` dispatcher.
bkchr added a commit to bkchr/tracing that referenced this pull request Jun 23, 2022
When the interest cache is configured as part of initing the `LogTracer` before
the global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
tokio-rs#2083 the callsite registration
changed to use `dispatcher::get_default`. This call internally sets the
thread local `default` dispatcher to `Dispatch::none` when there isn't
yet any dispatcher set. This `default` dispatcher isn't overwritten when
setting the global dispatcher. The solution is to not store the
`Dispatch::none` as `default` dispatcher.
bkchr added a commit to bkchr/tracing that referenced this pull request Aug 29, 2022
When the interest cache is configured as part of initing the `LogTracer` before
the global dispatcher is set, the main thread fails to log. The problem
is that the interests cache registers a dummy callsite. With
tokio-rs#2083 the callsite registration
changed to use `dispatcher::get_default`. This call internally sets the
thread local `default` dispatcher to `Dispatch::none` when there isn't
yet any dispatcher set. This `default` dispatcher isn't overwritten when
setting the global dispatcher. The solution is to not store the
`Dispatch::none` as `default` dispatcher.
hawkw added a commit that referenced this pull request Jun 24, 2023
## Motivation

A deadlock exists when a collector's `register_callsite` method calls
into code that also contains tracing instrumentation and triggers a
second `register_callsite` call for the same callsite. This is because
the current implementation of the `MacroCallsite` type holds a
`core::sync::Once` which it uses to ensure that it is only added to the
callsite registry a single time. This deadlock was fixed in v0.1.x in PR
#2083, but the issue still exists on v0.2.x.

## Solution

This branch forward-ports the solution from #2083. Rather than using a
`core::sync::Once`, we now track the callsite's registration state
directly in `MacroCallsite`. If a callsite has started registering, but
has not yet completed, subsequent `register` calls will just immediately
receive an `Interest::sometimes` until the registration has completed,
rather than waiting to attempt their own registration.

I've also forward-ported the tests for this that were added in #2083.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/core Related to the `tracing-core` crate kind/feature New feature or request kind/perf Performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants