Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: fix recursive register_callsite deadlock #2634

Merged
merged 5 commits into from Jun 24, 2023

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jun 24, 2023

Motivation

A deadlock exists when a collector's register_callsite method calls
into code that also contains tracing instrumentation and triggers a
second register_callsite call for the same callsite. This is because
the current implementation of the MacroCallsite type holds a
core::sync::Once which it uses to ensure that it is only added to the
callsite registry a single time. This deadlock was fixed in v0.1.x in PR
#2083, but the issue still exists on v0.2.x.

Solution

This branch forward-ports the solution from #2083. Rather than using a
core::sync::Once, we now track the callsite's registration state
directly in MacroCallsite. If a callsite has started registering, but
has not yet completed, subsequent register calls will just immediately
receive an Interest::sometimes until the registration has completed,
rather than waiting to attempt their own registration.

I've also forward-ported the tests for this that were added in #2083.

@hawkw hawkw requested review from davidbarsky and a team as code owners June 24, 2023 17:02
@hawkw hawkw changed the title tracing: fix deadlock on recursive register_callsite calls tracing: fix recursive register_callsite deadlock Jun 24, 2023
@hawkw hawkw merged commit c6e8a8f into master Jun 24, 2023
56 checks passed
@hawkw hawkw deleted the eliza/callsite-deadlock-0.2 branch June 24, 2023 17:13
hawkw added a commit to tosc-rs/mnemos that referenced this pull request Jun 24, 2023
Currently, the `tracing` collector in the Allwinner D1 platform impl is
hardcoded to the DEBUG level. This is unfortunate, as it means that we
will always do the work of collecting and formatting DEBUG traces, even
when no one is listening on the UART tracing port. It would be more
efficient if we only formatted traces when a `crowtty` instance has
subscribed to the UART tracing port. Additionally, because trace
metadata is sent when the callsite is initially registered, a `crowtty`
instance that attaches after the board has started running may miss some
trace metadata. Finally, there is no way to change the trace level on
the fly, as it's hard-coded in the kernel.

This branch changes the `mnemos-trace-proto` wire protocol to include a
host to target message to select a `tracing` level. Now, the board can
start up with all tracing disabled, and `crowtty` can send the desired
tracing level once it connects. This way, we don't collect or format
traces at all when `crowtty` isn't connected. In order to ensure
`crowtty` waits to send the tracing level message only once the board's
tracing code has been initialized, we add a periodic "heartbeat" message
sent by the board on the tracing port while tracing is idle. This is
used by the host which is listening for traces to detect the presence of
the tracing collector on the debug target. The same message is used to
ack a successful request to select the tracing level.

When the tracing level is selected, the target rebuilds `tracing`'s
callsite cache, which means all metadata for callsites which have
previously been hit will be sent to the host. This ensures that
`crowtty` always has the metadata for all tracing spans and events that
are enabled, even if the target encountered them before `crowtty`
connected.

Making this work required fixing an upstream `tracing` bug,
tokio-rs/tracing#2634, where calling into code
that includes `tracing` diagnostics from inside a
`Collect::register_callsite` method would cause a deadlock.
Additionally, I had to add code to the collector for tracking whether
it's inside its own `send` method, in order to temporarily disable `bbq`
tracing. This is because the collector's use of `bbq` creates an
infinite loop when the TRACE level is enabled.

Check this out:

![image](https://github.com/tosc-rs/mnemos/assets/2796466/9a527c27-a5c4-422f-b2b3-bd0a46ba4794)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant