runtime: expand on runtime metrics #4373

carllerche · 2022-01-03T22:08:04Z

Work in progress

This PR expands on previously implemented runtime metrics adding more detail around runtime no-op ticks and schedule counts.

Depends on #4377
Refs: #4373

Todo

Tests
Number of tasks stolen per-worker
Local-queue overflow count
Naming: metrics, stats, or "performance counters"

This patch does some refactoring to the current-thread scheduler bringing it closer to the structure of the multi-threaded scheduler. More specifically, the core scheduler data is stored in a Core struct and that struct is passed around as a "token" indicating permission to do work. The Core structure is also stored in the thread-local context. This refactor is intended to support #4373, making it easier to track counters in more locations in the current-thread scheduler.

Re-applies #4377 and fixes the bug resulting in Hyper's double panic. Revert: #4394 Original PR: This PR does some refactoring to the current-thread scheduler bringing it closer to the structure of the multi-threaded scheduler. More specifically, the core scheduler data is stored in a Core struct and that struct is passed around as a "token" indicating permission to do work. The Core structure is also stored in the thread-local context. This refactor is intended to support #4373, making it easier to track counters in more locations in the current-thread scheduler. I tried to keep commits small, but the "set Core in thread-local context" is both the biggest commit and the key one.

tokio/tests/rt_basic.rs

carllerche · 2022-01-20T04:09:37Z

@Darksonn @hawkw @LucioFranco I'm marking this as ready to review. There is more work to do on docs, but we can do that in follow-up PRs. The feature is still marked as unstable and others are waiting for this PR to land to do work in parallel.

Darksonn

Good to see all of those tests.

Darksonn · 2022-01-21T17:44:30Z

tokio/Cargo.toml

+# Technically, removing this is a breaking change even though it only ever did
+# anything with the unstable flag on. It is probably safe to get rid of it after
+# a few releases.
+stats = []


Why shouldn't this continue to be a feature flag? For machines without 64-bit atomics, the stats involve locking a bunch of mutexes quite a lot.

If we rename to metrics we will need to change the feature flag. While unstable, I'm not worried about it since technically adding a feature flag is part of the public API.

We can always do runtime enabling / disabling of metrics collection.

tokio/src/runtime/basic_scheduler.rs

tokio/src/runtime/metrics/batch.rs

Darksonn · 2022-01-21T17:54:47Z

tokio/src/runtime/metrics/runtime.rs

+    /// Returns the number of times the given worker thread stole tasks from
+    /// another worker thread.


So not the number of tasks it has stolen?

Yeah, I think the number of times it successfully steals is more interesting than the number of tasks, but I could be wrong.

tokio/src/runtime/metrics/runtime.rs

Darksonn · 2022-01-21T18:01:37Z

tokio/src/runtime/mod.rs

+
+    cfg_metrics! {
+        impl Runtime {
+            /// TODO


tokio/src/runtime/queue.rs

LucioFranco

few questions inline nothing blocking but a few typos

LucioFranco · 2022-01-21T17:46:35Z

tokio/src/runtime/basic_scheduler.rs

+        }
+
+        pub(crate) fn worker_metrics(&self, worker: usize) -> &WorkerMetrics {
+            assert_eq!(0, worker);


why the assert here? Is there a better way to express this invariant maybe?

The index is a low level API. Do you have a suggestion for a better way to express it?

LucioFranco · 2022-01-21T17:47:25Z

tokio/src/runtime/basic_scheduler.rs

+            &self.shared.scheduler_metrics
+        }
+
+        pub(crate) fn remote_queue_depth(&self) -> usize {


is it possible for a user to contend this by accident?

Currently, yes w/ the current-thread runtime, but it isn't inherent. The injection queue should be improved in a later PR.

tokio/src/runtime/metrics/batch.rs

LucioFranco · 2022-01-21T17:51:43Z

tokio/src/runtime/metrics/batch.rs

+
+cfg_rt_multi_thread! {
+    impl MetricsBatch {
+        pub(crate) fn incr_steal_count(&mut self, by: u16) {


because that is the type used to track queue size internally. It isn't a public fn.

tokio/src/runtime/metrics/runtime.rs

LucioFranco · 2022-01-21T18:00:42Z

tokio/src/runtime/metrics/runtime.rs

+///
+/// [`Runtime::metrics`]: crate::runtime::Runtime::metrics()
+#[derive(Clone, Debug)]
+pub struct RuntimeMetrics {


I wonder if it makes more sense to have a Worker struct that contains Handle and worker_id: usize instead of having each api panic if the index is incorrect?

Tried that initially. It is a pain to wire it all up w/o adding a bunch of structs w/ lifetimes in the public API. It also seems less future-proof.

tokio/src/runtime/metrics/runtime.rs

LucioFranco · 2022-01-21T18:36:10Z

tokio/src/runtime/metrics/runtime.rs

+            .load(Relaxed)
+    }
+
+    /// Returns the number of tasks currently scheduled in the runtime's


Should we consider not using internal language in docs like this? We also probably need some place that defines what an injection queue is.

What would you call the queue? The details of the injection queue vs. local queue are relevant when understanding the runtime characteristics of the runtime.

LucioFranco · 2022-01-21T18:40:35Z

tokio/src/runtime/mod.rs

+
+    cfg_metrics! {
+        impl Runtime {
+            /// TODO


I'm leaving it to a follow-up PR. Still unstable.

Co-authored-by: Alice Ryhl <alice@ryhl.io> Co-authored-by: Lucio Franco <luciofranco14@gmail.com>

carllerche · 2022-01-21T21:32:19Z

I'm going to merge once CI passes. If there any other points you want to continue discussing or track before stabilizing, please add them here. I already tracked that docs (TODO) should be handled.

github-actions bot added the R-loom Run loom tests on this PR label Jan 3, 2022

carllerche marked this pull request as draft January 3, 2022 22:08

carllerche mentioned this pull request Jan 4, 2022

rt: refactor current-thread scheduler #4377

Merged

carllerche force-pushed the more-rt-metrics branch from 26e60bd to 60c7152 Compare January 5, 2022 18:28

carllerche mentioned this pull request Jan 11, 2022

rt: refactor current-thread scheduler (take 2) #4395

Merged

rt: expand on runtime metrics.

7c71ee1

carllerche force-pushed the more-rt-metrics branch from 04230c9 to 7c71ee1 Compare January 12, 2022 18:00

carllerche added 19 commits January 12, 2022 14:59

rename stats -> metrics

82d4467

RuntimeMetrics -> SchedulerMetrics

5895905

refactor metrics

5bff844

Merge remote-tracking branch 'origin/master' into more-rt-metrics

19de22d

instrument queue depth

ea4a303

remove stolen count

fc8f28a

fix build

b60a16a

instrument remote queue

b0098f0

fix queue tests

852406b

fix loom tests

7ed15d1

test queue metrics

66cf82a

more work

b62f724

more tests

2bae9e8

more tests

24ebd65

tests

5f58cf6

try fixing tests

f1d0dcf

remove some adhoc tests

0192ace

make test more robust

6ea5437

make tests more robust

61f1621

Darksonn reviewed Jan 19, 2022

View reviewed changes

tokio/tests/rt_basic.rs Show resolved Hide resolved

write some docs

43f58f3

carllerche marked this pull request as ready for review January 20, 2022 04:07

Darksonn approved these changes Jan 21, 2022

View reviewed changes

LucioFranco approved these changes Jan 21, 2022

View reviewed changes

carllerche and others added 3 commits January 21, 2022 13:01

Apply suggestions from code review

47ef235

Co-authored-by: Alice Ryhl <alice@ryhl.io> Co-authored-by: Lucio Franco <luciofranco14@gmail.com>

rename -> injection_queue_depth

db9e139

revert unnecessary change

3fe79b3

carllerche merged commit 24f4ee3 into master Jan 22, 2022

carllerche deleted the more-rt-metrics branch January 22, 2022 06:17

carllerche mentioned this pull request Jan 27, 2022

chore: prepare Tokio v1.16 release. #4431

Merged

MichaIng mentioned this pull request Jan 28, 2022

build(deps): bump tokio from 1.15.0 to 1.16.1 in /src/backend ravenclaw900/DietPi-Dashboard#134

Merged

Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: expand on runtime metrics #4373

runtime: expand on runtime metrics #4373

carllerche commented Jan 3, 2022 •

edited

carllerche commented Jan 20, 2022

Darksonn left a comment

Darksonn Jan 21, 2022

carllerche Jan 21, 2022

Darksonn Jan 21, 2022

carllerche Jan 21, 2022

Darksonn Jan 21, 2022

LucioFranco left a comment

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022 •

edited

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022

LucioFranco Jan 21, 2022

carllerche Jan 21, 2022

carllerche commented Jan 21, 2022

		/// Returns the number of times the given worker thread stole tasks from
		/// another worker thread.

runtime: expand on runtime metrics #4373

runtime: expand on runtime metrics #4373

Conversation

carllerche commented Jan 3, 2022 • edited

Todo

carllerche commented Jan 20, 2022

Darksonn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucioFranco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche Jan 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche commented Jan 21, 2022

carllerche commented Jan 3, 2022 •

edited

carllerche Jan 21, 2022 •

edited