Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent start and end time, startTime > endTime issues with express #1209

Closed
alexd6631 opened this issue Sep 28, 2022 · 15 comments · Fixed by #1210
Closed

Inconsistent start and end time, startTime > endTime issues with express #1209

alexd6631 opened this issue Sep 28, 2022 · 15 comments · Fixed by #1210
Assignees
Labels
bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect

Comments

@alexd6631
Copy link

alexd6631 commented Sep 28, 2022

What version of OpenTelemetry are you using?

$ npm list @opentelemetry/instrumentation-express   

test-obs@0.0.0 /Users/alexandre/Dev/Alteia/test-observability/test-obs
└─┬ @opentelemetry/auto-instrumentations-node@0.32.1
  └── @opentelemetry/instrumentation-express@0.31.1

What version of Node are you using?

$ node -v  
v16.17.0

Running on a M1 MacBook Pro

What did you do?

I am running a simple nest app to familiarize with open telemetry instrumentation and tooling.
I have added the auto-instrumentation as mentioned in the docs, and running a custom nest app.
After some times (especially If the Macbook go to sleep) I start to see such warning in the output :

Inconsistent start and end time, startTime > endTime [ 1664350641, 619671106 ] [ 1664350641, 619333687 ]

Jaeger traces looks completely wrong
image

What did you expect to see?

Here is the jaeger traces before the problem is appearing

image

What did you see instead?

The timing for the express span is getting completely off.

Additional context

I did a quick debugging session and source code analysis and I am very suspicious about the following code :

It seems it is using a performance timer for the span end time, which may somehow differ from the wall time (open-telemetry/opentelemetry-js#852 looks related), whereas other instrumentations does not override the span end time and works correctly in my case.

@alexd6631 alexd6631 added the bug Something isn't working label Sep 28, 2022
@alexd6631
Copy link
Author

Also noted, that even in the "good" trace, the request handler - /api span is much shorter than the actual request processing which should be ~2.2s (I am using setTimeouts in my toy service/repo to simulate serial/parallel processing).

@Flarna
Copy link
Member

Flarna commented Sep 28, 2022

start and end have to use the same time source otherwise the duration is likely garbage as shown here.

Before open-telemetry/opentelemetry-js#3134 this was the case per luck but that's no longer true.

As there is not common time source in OTel API and span API has the possibility to specifiy start and end it's hard to ensure that this is done correct everywhere.

@dyladan Any idea how we could expose the anchored clock to allow use cases like that one in express?

@Flarna
Copy link
Member

Flarna commented Sep 28, 2022

this is a dup of #1193

@alexd6631
Copy link
Author

By curiosity is there any reason not to let the framework use the current time internally (ie not passing time to span.end(), which would be consistent with the span.start()) ? Performance / precision reason maybe ?

Also the start of the span seems to use the default time source, whereas the end use hrTime(). Maybe using hrTime() in both places could be an alternative solution.

const span = instrumentation.tracer.startSpan(spanName, {

@Flarna
Copy link
Member

Flarna commented Sep 28, 2022

usually noone passes start/end times and lets the SDK do this.
But it seems express instrumentation has a special situations where a span is started and then it notices that the span is "useless" and they end the span using a manual provided time to get a span with 0 duration.

I think it would be better to just call span.end() in express even in these cases which would choose the same time source for start/end and avoid this problem.

@dyladan
Copy link
Member

dyladan commented Sep 28, 2022

@dyladan Any idea how we could expose the anchored clock to allow use cases like that one in express?

We could provide it on the span but that would be an API change. We could also provide an SDK function which takes a span and returns its clock but that would be quite confusing to end users who just want a current time.

I think it would be better to just call span.end() in express even in these cases which would choose the same time source for start/end and avoid this problem.

I agree here. I'll make a PR for that.


We should see if we can solve this more generally as express may not be the only place this happens and it is quite a difficult problem to solve.

@dyladan
Copy link
Member

dyladan commented Sep 28, 2022

#1210

@alexd6631
Copy link
Author

From an outside point of view, it seems indeed the most simple approach would be to let the SDK handle the clock internally and use span.end().

I am planning to use OpenTelemetry on a future project and experimenting different aspects of observability with a toy project, glad to see the community is reactive and things are sorted out quickly.

@Flarna
Copy link
Member

Flarna commented Sep 28, 2022

We could provide it on the span but that would be an API change. We could also provide an SDK function which takes a span and returns its clock but that would be quite confusing to end users who just want a current time.

I think it should be on the API and quite independent of spans. Best would be something like a TimeProvider interface. SDK can plug in the implementation matching to the platform or even user might override it during SDK init.

@dyladan
Copy link
Member

dyladan commented Sep 28, 2022

We could provide it on the span but that would be an API change. We could also provide an SDK function which takes a span and returns its clock but that would be quite confusing to end users who just want a current time.

I think it should be on the API and quite independent of spans. Best would be something like a TimeProvider interface. SDK can plug in the implementation matching to the platform or even user might override it during SDK init.

How do you make sure you're getting the clock that is the correct anchored clock? I guess the time provider can use whatever clock the currently active span is using?

@dyladan
Copy link
Member

dyladan commented Sep 28, 2022

For people reading this issue and following the conversation, I created a separate issue for the clock handling: open-telemetry/opentelemetry-js#3279

@alexd6631
Copy link
Author

alexd6631 commented Sep 28, 2022

Just to have more context/understanding on the background issue you are addressing, is it related that the wall clock would > not be monotonic (for instance a NTP update, could adjust the clock negatively) and you could end up with negative spans ?

I will move the comment to the new issue

@dyladan dyladan self-assigned this Sep 28, 2022
@dyladan dyladan added the priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect label Sep 28, 2022
@guyguy333
Copy link

guyguy333 commented Sep 29, 2022

@dyladan Does this issue also solves an issue on browser side ? We have negative durations on browser traces. It's not happening on all spans.

[{"traceId":"b4b33b69b98834dfaddd6f699d72f26d","name":"HTTP GET","id":"2abf438f20c7ca2b","kind":"CLIENT","timestamp":1664460870221900,"duration":-1101703900,"localEndpoint":{"serviceName":"foo-browser"},"tags":{"component":"fetch","http.method":"GET","http.url":"https://localhost.dev.foo.com:3000/_next/static/development/_devPagesManifest.json","http.status_code":"200","http.status_text":"OK","http.host":"localhost.dev.foo.com:3000","http.scheme":"https","http.user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","http.response_content_length":"325","service.name":"foo-browser","telemetry.sdk.language":"webjs","telemetry.sdk.name":"opentelemetry","telemetry.sdk.version":"1.7.0"},"annotations":[{"timestamp":1664459768500100,"value":"fetchStart"},{"timestamp":1664459768500100,"value":"domainLookupStart"},{"timestamp":1664459768500100,"value":"domainLookupEnd"},{"timestamp":1664459768500100,"value":"connectStart"},{"timestamp":1664459768500100,"value":"secureConnectionStart"},{"timestamp":1664459768500100,"value":"connectEnd"},{"timestamp":1664459768501200,"value":"requestStart"},{"timestamp":1664459768514100,"value":"responseStart"},{"timestamp":1664459768514300,"value":"responseEnd"}]}]

@dyladan
Copy link
Member

dyladan commented Sep 29, 2022

No this only fixes express. I'm working on a browser fix

@guyguy333
Copy link

No this only fixes express. I'm working on a browser fix

Ok thank you :) Is there an issue I can track for the fix ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
4 participants