console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

jswrenn · 2022-07-21T21:52:57Z

The task detail view of console now shows the tree of extant consequences stemming from that task; i.e., which spans are open beneath the task. The leaves of the task, in a sense, reflect what the task is busy doing at that moment.

I'm opening this as a draft PR because several issues must be resolved before merging; most importantly:

the spawn-a-new-thread-to-get-the-global-subscriber technique is horrendous
- I don't know how to solve this more elegantly.
tracing-causality should not be a git dependency
- its tests passing are blocked on tracing-core: implement PartialEq, Eq for Metadata, FieldSet tracing#2229 getting backported+released to tracing 0.1.x, or me writing an equality shim
it needs to be rebased on main

bryangarza · 2022-07-25T16:08:03Z

console-api/proto/common.proto

@@ -78,6 +78,8 @@ message Span {
    repeated Field fields = 3;
    // Timestamp for the span.
    google.protobuf.Timestamp at = 4;
+    // An Id that uniquely identifies this span's parent (if any).
+    optional SpanId parent_id = 5;


in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword

bryangarza · 2022-07-25T16:08:28Z

console-api/proto/consequences.proto

+
+message CloseDirect {
+    Span span = 1;
+    optional Span direct_cause = 2;


in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword

Huh, that's unfortunate, because here it's semantically significant whether a direct_cause is provided or not — some Spans don't have parents.

I think that’s fine, it will be wrapped in Optional in the generated code, so only if the field is included explicitly will it show up as Some(…)

bryangarza · 2022-07-25T16:08:36Z

console-api/proto/consequences.proto

+
+message CloseCyclic {
+    Span span = 1;
+    optional Span direct_cause = 2;


in proto3, every field (that is a Message) is optional by default, so you don't need the optional keyword

bryangarza · 2022-07-25T16:30:07Z

console-subscriber/src/aggregator/mod.rs

+                        causality: vec![],
+                    };
+                    updates_and_watchers.retain(|(updates, watch)| {
+                        let updates = crate::consequences::updates_into_proto(updates);


tiny nit: you don't need crate:: here

bryangarza · 2022-07-25T16:31:38Z

console-subscriber/src/aggregator/mod.rs

+                        let details = proto::tasks::TaskDetails {
+                            causality: updates,
+                            ..details.clone()
+                        };


maybe make details mutable so we can just update it and avoid the clone? or flip the order of the code so that we have the updates variable available before constructing details

bryangarza · 2022-07-25T16:33:15Z

console-subscriber/src/consequences.rs

+use tracing_causality as causality;
+use tracing_subscriber::Registry;
+
+fn into_proto_span(span: causality::Span) -> proto::consequences::Span {


if you import all the proto stuff but alias them to prefix proto, like protoSpan, it could look a bit cleaner

jswrenn · 2022-07-26T16:33:51Z

console-subscriber/src/aggregator/mod.rs

+            // XXX(jswrenn): A new thread is spawned solely for the sake of
+            // getting access to the global subscriber.
+            let (trace, updates) = std::thread::spawn(move || {
+                // XXX(jswrenn): Need to gracefully handle the case where the
+                // task span is already closed.
+                consequences::trace(&span, buffer)
+                    .expect("the subscriber isn't a Registry")
+                    .expect("id is already closed")
+            })
+            .join()
+            .unwrap();


I'd like to find a better alternative to this.

I think I found an alternative to this! It's just blocked on tokio-rs/tracing#2247.

hawkw · 2022-08-02T17:02:02Z

console-api/proto/consequences.proto

+        Extant extant = 1;
+        OpenDirect open_direct = 2;
+        NewIndirect new_indirect = 3;
+        CloseDirect close_direct = 4;
+        CloseIndirect close_indirect = 5;
+        CloseCyclic close_cyclic = 6;


let's add comments to these explaining what they mean, please? the terminology here is not immediately obvious to me, especially because it's not part of the terminology used in the tracing documentation; it would be good to explain what we mean when we say "direct" and "indirect", for example.

hawkw · 2022-08-02T17:04:07Z

console-api/proto/tasks.proto

@@ -49,6 +50,9 @@ message TaskDetails {

    // HdrHistogram.rs `Histogram` serialized to binary in the V2 format
    optional bytes poll_times_histogram = 3;
+
+    // What the task is currently busy doing.
+    repeated consequences.Causality causality = 4;


i don't love calling this field causality --- that suggests to me that a task's causality is that which caused the task to be spawned. i might call this something like spans or trace or something, since it's specifically a list of the span trees that the task is in, i believe?

hawkw · 2022-08-02T17:04:36Z

console-subscriber/examples/deep.rs

@@ -0,0 +1,44 @@
+use tokio::task;
+
+#[tokio::main]


would be nice to have a doc comment somewhere in here explaining what this example demonstrates.

hawkw · 2022-08-02T17:05:11Z

console-subscriber/examples/deep.rs

+
+    let subscriber = tracing_subscriber::registry()
+        .with(console_subscriber::spawn())
+        .with(tracing_subscriber::filter::filter_fn(|_| true));


this could alternatively be LevelFilter::TRACE to enable everything explicitly; either is fine by me though.

hawkw · 2022-08-02T17:08:53Z

tokio-console/src/state/mod.rs

+            let task_ref = self.tasks_state.task(task_id).unwrap();
+            let task_ref = task_ref.upgrade().unwrap();


i think if we get an update for a task whose ID we've never seen before, we probably shouldn't panic --- it means the server almost certainly did something wrong, but it would probably be better for the console CLI to print some kind of error and keep going, skipping that update, instead of crashing...

hawkw · 2022-08-02T17:11:55Z

tokio-console/src/state/mod.rs

+            if let None = fresh.causality {
+                fresh.causality = Some(tracing_causality::Trace::with_root(
+                    tracing_causality::Span {
+                        id: tracing::Id::from_u64(span_id),
+                        metadata: meta_id,
+                    },
+                ));
+            }


style nit, take it or leave it: this is essentially a reimplementation of Option::get_or_insert_with:

Suggested change

if let None = fresh.causality {

fresh.causality = Some(tracing_causality::Trace::with_root(

tracing_causality::Span {

id: tracing::Id::from_u64(span_id),

metadata: meta_id,

},

));

}

fresh.causality.get_or_insert_with(|| tracing_causality::Trace::with_root(

tracing_causality::Span {

id: tracing::Id::from_u64(span_id),

metadata: meta_id,

},

));

hawkw · 2022-08-02T17:14:08Z

tokio-console/src/state/mod.rs

+                    if let Some(trace) = fresh.causality {
+                        fresh.causality = trace.apply(update);
+                    }


nit, take it or leave it: this could be:

Suggested change

if let Some(trace) = fresh.causality {

fresh.causality = trace.apply(update);

}

fresh.causality = fresh.causality.and_then(|trace| trace.apply(update));

hawkw · 2022-08-02T17:15:06Z

tokio-console/src/state/mod.rs

+impl Display for Location {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        if let Some(file) = &self.file {
+            write!(f, "{file}")?;


the use of this syntax increases the tokio-console crate's minimum Rust dependency, so we'll either have to update the MSRV in Cargo.toml and on CI, or use the old syntax.

hawkw · 2022-08-02T17:16:23Z

tokio-console/src/view/task.rs

+
+    /// Render the consequences as an ascii tree.
+    fn make_stack_widget(&self, _styles: &view::Styles, state: &State) -> Text<'static> {
+        // XXX(jswrenn): also implement an ASCII-only rendering mode


i think it's fine to just not display the stack in ASCII-only mode, if it's difficult to implement an ASCII-only rendering for this. we already don't display the poll duration histogram in ASCII-only mode...

hawkw · 2022-08-02T18:19:29Z

tokio-console/src/state/mod.rs

@@ -497,7 +571,7 @@ impl Attribute {
    }
 }

-// === impl Ids ===
+// === impl Ids ===a


?

Suggested change

// === impl Ids ===a

// === impl Ids ===

hawkw · 2022-08-02T18:21:07Z

tokio-console/src/view/task.rs

@@ -341,3 +377,53 @@ fn dur(styles: &view::Styles, dur: std::time::Duration) -> Span<'static> {
    // have to be given a string in order to do layout stuff?
    styles.time_units(format!("{:.prec$?}", dur, prec = DUR_PRECISION))
 }
+
+fn display<W: fmt::Write>(


can we maybe give this a more descriptive name? this is specifically for displaying stack elements, right? so maybe something that indicates that?

alternatively, we could make it an inner function of the make_stack_widget function, i suppose.

hawkw · 2022-08-02T18:21:49Z

tokio-console/src/view/task.rs

+            display(&mut buf, state, causality, &root, true, "").unwrap();
+        } else {
+            buf.push_str(
+                "An unexpected error prevents us from showing consequences for this task.",


under what circumstances would this be the case?

hawkw · 2022-08-02T18:23:21Z

tokio-console/src/state/trace.rs

+                    move |(consequence, cause)| tracing_causality::Update::OpenDirect {
+                        cause: cause,
+                        consequence: span_from_pb(consequence),
+                    },
+                );
+            let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(
+                move |(consequence, cause)| tracing_causality::Update::NewIndirect {


what is moved into these closures? AFAICT the move keyword is unneeded:

Suggested change

move |(consequence, cause)| tracing_causality::Update::OpenDirect {

cause: cause,

consequence: span_from_pb(consequence),

},

);

let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(

move |(consequence, cause)| tracing_causality::Update::NewIndirect {

|(consequence, cause)| tracing_causality::Update::OpenDirect {

cause: cause,

consequence: span_from_pb(consequence),

},

);

let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map(

|(consequence, cause)| tracing_causality::Update::NewIndirect {

i'm assuming this was because cause was captured into the closures at one point before the zip(repeat(cause)) thing?

hawkw · 2022-08-02T18:24:43Z

console-subscriber/src/aggregator/mod.rs

+            // XXX(jswrenn): A new thread is spawned solely for the sake of
+            // getting access to the global subscriber.
+            let (trace, updates) = std::thread::spawn(move || {
+                // XXX(jswrenn): Need to gracefully handle the case where the
+                // task span is already closed.
+                consequences::trace(&span, buffer)
+                    .expect("the subscriber isn't a Registry")
+                    .expect("id is already closed")
+            })
+            .join()
+            .unwrap();


this is quite unfortunate, i'd like to figure out a better solution for this. at the very least, we could probably do this a single time when the aggregator task is spawned, and clone the Dispatch, so that we don't have to do this every subsequent time we want to get details for a task...

hawkw · 2022-08-02T18:26:58Z

tokio-console/Cargo.toml

@@ -34,6 +34,7 @@ tonic = { version = "0.7", features = ["transport"] }
 futures = "0.3"
 tui = { version = "0.16.0", default-features = false, features = ["crossterm"] }
 tracing = "0.1"
+tracing-causality = { path = "../../tracing-causality" }


can this be changed to a Git dependency rather than a path dependency, at least? i'd really like to be able to actually try out this branch without having to check that out.

…for tasks The task detail view of console now shows the tree of extant consequences stemming from that task; i.e., which spans are open beneath the task. The leaves of the task, in a sense, reflect what the task is busy doing at that moment.

jswrenn · 2022-08-17T18:33:13Z

console-subscriber/src/lib.rs

+    fn on_register_dispatch(&self, dispatch: &Dispatch) {
+        let tracer = consequences::Tracer::from_dispatch::<S>(dispatch);
+        if let Err(_) = self.shared.tracer.set(tracer) {
+            // XXX(jswrenn): This layer has seemingly been supplied *multiple*
+            // subscribers. Is that possible? I guess you can call
+            // `Layer::with_subscriber` multiple times; what happens if you
+            // install the resulting `Layered` as a subscriber? Is `Arc`ing a
+            // `Layered<ConsoleLayer, S>` also a recipe for weirdness?
+        }
+    }


The on_register_dispatch method defined by tokio-rs/tracing#2269 is implemented here, to construct a Tracer (i.e., a type that defines a trace method that consumes a span ID and produces a trace of its consequences), which is then made available to the Aggregator thread (the part of console_subscriber that actually handles the rpcs for reporting traces).

hawkw · 2022-08-17T18:39:11Z

console-subscriber/src/lib.rs

+            // XXX(jswrenn): This layer has seemingly been supplied *multiple*
+            // subscribers. Is that possible? I guess you can call
+            // `Layer::with_subscriber` multiple times; what happens if you
+            // install the resulting `Layered` as a subscriber? Is `Arc`ing a
+            // `Layered<ConsoleLayer, S>` also a recipe for weirdness?


I think a Layer implementation for Arc<L: Layer<_>> is the only way this could be called twice. Fortunately, there is no such impl in the current release version

hawkw · 2022-10-07T20:46:58Z

@jswrenn i believe the requisite upstream changes in tracing have all been released now; any interest in picking this back up?

bryangarza reviewed Jul 25, 2022

View reviewed changes

jswrenn commented Jul 26, 2022

View reviewed changes

jswrenn mentioned this pull request Jul 26, 2022

tracing-subscriber: impl LookupSpan for Box<LS> and Arc<LS> tokio-rs/tracing#2247

Merged

hawkw reviewed Aug 2, 2022

View reviewed changes

jswrenn force-pushed the causality branch from 7137c44 to 6a4597b Compare August 17, 2022 17:28

jswrenn commented Aug 17, 2022

View reviewed changes

hawkw reviewed Aug 17, 2022

View reviewed changes

DieracDelta mentioned this pull request Sep 13, 2022

Investigate console branch with stack tracing abilities EspressoSystems/HotShot#507

Closed

hawkw force-pushed the main branch from 638c1c3 to c8c4a85 Compare September 29, 2023 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

jswrenn commented Jul 21, 2022 •

edited

bryangarza Jul 25, 2022

bryangarza Jul 25, 2022

jswrenn Jul 29, 2022

bryangarza Jul 30, 2022

bryangarza Jul 25, 2022

bryangarza Jul 25, 2022

bryangarza Jul 25, 2022

bryangarza Jul 25, 2022

jswrenn Jul 26, 2022

jswrenn Jul 26, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

hawkw Aug 2, 2022

jswrenn Aug 17, 2022

hawkw Aug 17, 2022

hawkw commented Oct 7, 2022

		let task_ref = self.tasks_state.task(task_id).unwrap();
		let task_ref = task_ref.upgrade().unwrap();

console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

Are you sure you want to change the base?

console-api,console-subscriber,tokio-console: implement stack traces for tasks #363

Conversation

jswrenn commented Jul 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw commented Oct 7, 2022

jswrenn commented Jul 21, 2022 •

edited