Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timings do not work with spans entered concurrently multiple times #123

Open
mladedav opened this issue Apr 17, 2024 · 1 comment
Open

Comments

@mladedav
Copy link
Contributor

mladedav commented Apr 17, 2024

Bug Report

Version

v0.23.0

This also affects all previous versions.

Description

When timing information is computed (idle_ns and busy_ns) the decision of whether the last interval was idle or busy is determined solely on the operation that is happening right now. So if we are entering a span, we assume the span has been idle until now, if we're exiting we assume it has been busy.

This holds only if we alternate between entering and exiting. If we enter twice, the time between the two enters is considered idle as far as the timings are concrented.

Example test illustrating the issue
    #[test]
    fn timings_test() {
        let tracer = TestTracer(Arc::new(Mutex::new(None)));
        let subscriber = tracing_subscriber::registry().with(layer().with_tracer(tracer.clone()));

        tracing::subscriber::with_default(subscriber, || {
            let span = tracing::debug_span!("request", otel.status_code = ?otel::Status::Ok);
            let _guard_1 = span.enter();
            thread::sleep(std::time::Duration::from_millis(100));
            let _guard_2 = span.enter();
        });

        let attributes = tracer.with_data(|data| data.builder.attributes.as_ref().unwrap().clone());
        let Value::I64(idle) = attributes.iter().find_map(|key_value| (key_value.key.as_str() == "idle_ns").then_some(key_value.value.clone())).unwrap() else { panic!() };
        let Value::I64(busy) = attributes.iter().find_map(|key_value| (key_value.key.as_str() == "busy_ns").then_some(key_value.value.clone())).unwrap() else { panic!() };

        let idle_ms = idle / 1_000_000;
        let busy_ms = busy / 1_000_000;

        println!("Idle: {} ms", idle_ms);
        println!("Busy: {} ms", busy_ms);

        // These panic but should not!
        assert!(idle_ms < 50); // Should be just about 0
        assert!(busy_ms > 50); // Should be just about 100
    }

This can be problematic especially in parallel environments where the spans can be entered concurrently from multiple threads.


As an aside, I could not find any mention of these fields anywhere in OpenTelemetry spec. So I assume this is something that this library itself provides but is not standard, correct?

The question really is whether there is any kind of decision what the numbers should represent, currently I assume it's wall-time (so if an operation is running on 8 threads for 100 ms, busy time should still be just 100 ms not 800 ms).

I assume it would be too much of a breaking change to remove it now, but is there a reason it is here instead of tracing itself or some other library?

@cijothomas
Copy link

As an aside, I could not find any mention of these fields anywhere in OpenTelemetry spec. So I assume this is something that this library itself provides but is not standard, correct?

Yes. In OpenTelemetry, this most likely belongs to Profiling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants