Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jswrenn
Copy link

@jswrenn jswrenn commented Jul 21, 2022

The task detail view of console now shows the tree of extant consequences stemming from that task; i.e., which spans are open beneath the task. The leaves of the task, in a sense, reflect what the task is busy doing at that moment.

I'm opening this as a draft PR because several issues must be resolved before merging; most importantly:

  • the spawn-a-new-thread-to-get-the-global-subscriber technique is horrendous
    • I don't know how to solve this more elegantly.
  • tracing-causality should not be a git dependency
  • it needs to be rebased on main

@@ -78,6 +78,8 @@ message Span {
repeated Field fields = 3;
// Timestamp for the span.
google.protobuf.Timestamp at = 4;
// An Id that uniquely identifies this span's parent (if any).
optional SpanId parent_id = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword


message CloseDirect {
Span span = 1;
optional Span direct_cause = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's unfortunate, because here it's semantically significant whether a direct_cause is provided or not — some Spans don't have parents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s fine, it will be wrapped in Optional in the generated code, so only if the field is included explicitly will it show up as Some(…)


message CloseCyclic {
Span span = 1;
optional Span direct_cause = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword

causality: vec![],
};
updates_and_watchers.retain(|(updates, watch)| {
let updates = crate::consequences::updates_into_proto(updates);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: you don't need crate:: here

Comment on lines +401 to +405
let details = proto::tasks::TaskDetails {
causality: updates,
..details.clone()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make details mutable so we can just update it and avoid the clone? or flip the order of the code so that we have the updates variable available before constructing details

use tracing_causality as causality;
use tracing_subscriber::Registry;

fn into_proto_span(span: causality::Span) -> proto::consequences::Span {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you import all the proto stuff but alias them to prefix proto, like protoSpan, it could look a bit cleaner

Comment on lines 327 to 337
// XXX(jswrenn): A new thread is spawned solely for the sake of
// getting access to the global subscriber.
let (trace, updates) = std::thread::spawn(move || {
// XXX(jswrenn): Need to gracefully handle the case where the
// task span is already closed.
consequences::trace(&span, buffer)
.expect("the subscriber isn't a Registry")
.expect("id is already closed")
})
.join()
.unwrap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to find a better alternative to this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found an alternative to this! It's just blocked on tokio-rs/tracing#2247.

Comment on lines +9 to +14
Extant extant = 1;
OpenDirect open_direct = 2;
NewIndirect new_indirect = 3;
CloseDirect close_direct = 4;
CloseIndirect close_indirect = 5;
CloseCyclic close_cyclic = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add comments to these explaining what they mean, please? the terminology here is not immediately obvious to me, especially because it's not part of the terminology used in the tracing documentation; it would be good to explain what we mean when we say "direct" and "indirect", for example.

@@ -49,6 +50,9 @@ message TaskDetails {

// HdrHistogram.rs `Histogram` serialized to binary in the V2 format
optional bytes poll_times_histogram = 3;

// What the task is currently busy doing.
repeated consequences.Causality causality = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love calling this field causality --- that suggests to me that a task's causality is that which caused the task to be spawned. i might call this something like spans or trace or something, since it's specifically a list of the span trees that the task is in, i believe?

@@ -0,0 +1,44 @@
use tokio::task;

#[tokio::main]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a doc comment somewhere in here explaining what this example demonstrates.


let subscriber = tracing_subscriber::registry()
.with(console_subscriber::spawn())
.with(tracing_subscriber::filter::filter_fn(|_| true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could alternatively be LevelFilter::TRACE to enable everything explicitly; either is fine by me though.

Comment on lines +234 to +235
let task_ref = self.tasks_state.task(task_id).unwrap();
let task_ref = task_ref.upgrade().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if we get an update for a task whose ID we've never seen before, we probably shouldn't panic --- it means the server almost certainly did something wrong, but it would probably be better for the console CLI to print some kind of error and keep going, skipping that update, instead of crashing...

Comment on lines +249 to +256
if let None = fresh.causality {
fresh.causality = Some(tracing_causality::Trace::with_root(
tracing_causality::Span {
id: tracing::Id::from_u64(span_id),
metadata: meta_id,
},
));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, take it or leave it: this is essentially a reimplementation of Option::get_or_insert_with:

Suggested change
if let None = fresh.causality {
fresh.causality = Some(tracing_causality::Trace::with_root(
tracing_causality::Span {
id: tracing::Id::from_u64(span_id),
metadata: meta_id,
},
));
}
fresh.causality.get_or_insert_with(|| tracing_causality::Trace::with_root(
tracing_causality::Span {
id: tracing::Id::from_u64(span_id),
metadata: meta_id,
},
));

Comment on lines +260 to +262
if let Some(trace) = fresh.causality {
fresh.causality = trace.apply(update);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: this could be:

Suggested change
if let Some(trace) = fresh.causality {
fresh.causality = trace.apply(update);
}
fresh.causality = fresh.causality.and_then(|trace| trace.apply(update));

impl Display for Location {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if let Some(file) = &self.file {
write!(f, "{file}")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use of this syntax increases the tokio-console crate's minimum Rust dependency, so we'll either have to update the MSRV in Cargo.toml and on CI, or use the old syntax.


/// Render the consequences as an ascii tree.
fn make_stack_widget(&self, _styles: &view::Styles, state: &State) -> Text<'static> {
// XXX(jswrenn): also implement an ASCII-only rendering mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's fine to just not display the stack in ASCII-only mode, if it's difficult to implement an ASCII-only rendering for this. we already don't display the poll duration histogram in ASCII-only mode...

@@ -497,7 +571,7 @@ impl Attribute {
}
}

// === impl Ids ===
// === impl Ids ===a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Suggested change
// === impl Ids ===a
// === impl Ids ===

@@ -341,3 +377,53 @@ fn dur(styles: &view::Styles, dur: std::time::Duration) -> Span<'static> {
// have to be given a string in order to do layout stuff?
styles.time_units(format!("{:.prec$?}", dur, prec = DUR_PRECISION))
}

fn display<W: fmt::Write>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maybe give this a more descriptive name? this is specifically for displaying stack elements, right? so maybe something that indicates that?

alternatively, we could make it an inner function of the make_stack_widget function, i suppose.

display(&mut buf, state, causality, &root, true, "").unwrap();
} else {
buf.push_str(
"An unexpected error prevents us from showing consequences for this task.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what circumstances would this be the case?

Comment on lines +27 to +33
move |(consequence, cause)| tracing_causality::Update::OpenDirect {
cause: cause,
consequence: span_from_pb(consequence),
},
);
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(
move |(consequence, cause)| tracing_causality::Update::NewIndirect {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is moved into these closures? AFAICT the move keyword is unneeded:

Suggested change
move |(consequence, cause)| tracing_causality::Update::OpenDirect {
cause: cause,
consequence: span_from_pb(consequence),
},
);
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(
move |(consequence, cause)| tracing_causality::Update::NewIndirect {
|(consequence, cause)| tracing_causality::Update::OpenDirect {
cause: cause,
consequence: span_from_pb(consequence),
},
);
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(
|(consequence, cause)| tracing_causality::Update::NewIndirect {

i'm assuming this was because cause was captured into the closures at one point before the zip(repeat(cause)) thing?

Comment on lines 327 to 337
// XXX(jswrenn): A new thread is spawned solely for the sake of
// getting access to the global subscriber.
let (trace, updates) = std::thread::spawn(move || {
// XXX(jswrenn): Need to gracefully handle the case where the
// task span is already closed.
consequences::trace(&span, buffer)
.expect("the subscriber isn't a Registry")
.expect("id is already closed")
})
.join()
.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite unfortunate, i'd like to figure out a better solution for this. at the very least, we could probably do this a single time when the aggregator task is spawned, and clone the Dispatch, so that we don't have to do this every subsequent time we want to get details for a task...

@@ -34,6 +34,7 @@ tonic = { version = "0.7", features = ["transport"] }
futures = "0.3"
tui = { version = "0.16.0", default-features = false, features = ["crossterm"] }
tracing = "0.1"
tracing-causality = { path = "../../tracing-causality" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be changed to a Git dependency rather than a path dependency, at least? i'd really like to be able to actually try out this branch without having to check that out.

…for tasks

The task detail view of console now shows the tree of extant
consequences stemming from that task; i.e., which spans are
open beneath the task. The leaves of the task, in a sense,
reflect what the task is busy doing at that moment.
Comment on lines +503 to +512
fn on_register_dispatch(&self, dispatch: &Dispatch) {
let tracer = consequences::Tracer::from_dispatch::<S>(dispatch);
if let Err(_) = self.shared.tracer.set(tracer) {
// XXX(jswrenn): This layer has seemingly been supplied *multiple*
// subscribers. Is that possible? I guess you can call
// `Layer::with_subscriber` multiple times; what happens if you
// install the resulting `Layered` as a subscriber? Is `Arc`ing a
// `Layered<ConsoleLayer, S>` also a recipe for weirdness?
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on_register_dispatch method defined by tokio-rs/tracing#2269 is implemented here, to construct a Tracer (i.e., a type that defines a trace method that consumes a span ID and produces a trace of its consequences), which is then made available to the Aggregator thread (the part of console_subscriber that actually handles the rpcs for reporting traces).

Comment on lines +506 to +510
// XXX(jswrenn): This layer has seemingly been supplied *multiple*
// subscribers. Is that possible? I guess you can call
// `Layer::with_subscriber` multiple times; what happens if you
// install the resulting `Layered` as a subscriber? Is `Arc`ing a
// `Layered<ConsoleLayer, S>` also a recipe for weirdness?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a Layer implementation for Arc<L: Layer<_>> is the only way this could be called twice. Fortunately, there is no such impl in the current release version

@hawkw
Copy link
Member

hawkw commented Oct 7, 2022

@jswrenn i believe the requisite upstream changes in tracing have all been released now; any interest in picking this back up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants