Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing measurements in telemetry events #340

Closed
danschultzer opened this issue Apr 11, 2024 · 6 comments
Closed

Missing measurements in telemetry events #340

danschultzer opened this issue Apr 11, 2024 · 6 comments

Comments

@danschultzer
Copy link
Contributor

danschultzer commented Apr 11, 2024

Hey @mtrudel,

We're looking to update opentelemetry_bandit to support for Bandit 1.4 breaking changes in open-telemetry/opentelemetry-erlang-contrib#311 and measurements are missing from some [:bandit, :request, :stop] telemetry events.

This happened for 404s with a Phoenix app.

Looking up Bandit.Telemetry.stop_span calls there are two places where measurements are not set:

Bandit.Telemetry.stop_span(span, %{}, %{error: error.message})

Bandit.Telemetry.stop_span(span, %{}, %{error: error.message})

In't this incorrect as Bandit.Telemetry explicitly states that there will be the above measurements in [:bandit, :request: stop] events?

* `[:bandit, :request, :stop]`
Represents the end of the span
This event contains the following measurements:
* `monotonic_time`: The time of this event, in `:native` units
* `duration`: The span duration, in `:native` units
* `req_header_end_time`: The time that header reading completed, in `:native` units
* `req_body_start_time`: The time that request body reading started, in `:native` units.
* `req_body_end_time`: The time that request body reading completed, in `:native` units
* `req_body_bytes`: The length of the request body, in octets
* `resp_start_time`: The time that the response started, in `:native` units
* `resp_end_time`: The time that the response completed, in `:native` units
* `resp_body_bytes`: The length of the response body, in octets. If the response is
compressed, this is the size of the compressed payload as sent on the wire
* `resp_uncompressed_body_bytes`: The length of the original, uncompressed body. Only
included for responses which are compressed
* `resp_compression_method`: The method of compression, as sent in the `Content-Encoding`
header of the response. Only included for responses which are compressed
This event contains the following metadata:
* `telemetry_span_context`: A unique identifier for this span
* `connection_telemetry_span_context`: The span context of the Thousand Island `:connection`
span which contains this request
* `conn`: The `Plug.Conn` representing this connection. Not present in cases where `error`
is also set and the nature of error is such that Bandit was unable to successfully build
the conn
* `error`: The error that caused the span to end, if it ended in error

I haven't verified whether the 404 is getting into the latter exception handling, but I suspect it does.

@danschultzer
Copy link
Contributor Author

danschultzer commented Apr 11, 2024

Actually we don't see the exceptions happening in our use case. It's weird, it seems that the conn/conn.adapter doesn't persist with the measurement, even though send_resp gets called so it should have been set. It happens for the 404, but haven't had the time to grok the whole code flow. Bottomline, when stop_span gets called in the Bandit.Pipeline the measurement is lost.

@mtrudel
Copy link
Owner

mtrudel commented Apr 11, 2024

The error case1 telemetry events won't have many / most of the measurements, this is ~more or less by design since it simplifies the mainline code so much. That being said, there should be some there; this may well be a bug.

Footnotes

  1. Error case meaning anything that Bandit returns internally (stream errors, semantic errors, etc). Any response that your plug sets on its own is considered 'normal' by Bandit, regardless of its response code.

@danschultzer
Copy link
Contributor Author

Yeah, it makes sense there is not really any measurements for errors since it means it halted somewhere halfway so what are those measurements even good for. But shouldn't that be a [:bandit, :request, :exception] instead of [:bandit, :request, :stop] event then?

In any case that wouldn't address the issue we see in the otel library though, that's a 404 that is handled by Phoenix (no exception raised to Bandit), but somehow the measurements get lost. I'll see if I can make a reproducable example for you when I have some free time.

@mtrudel
Copy link
Owner

mtrudel commented May 10, 2024

I'd love a repro for this if possible @danschultzer !

@mtrudel
Copy link
Owner

mtrudel commented May 15, 2024

Looking into the pipeline code in more detail now that the post-1.4 world has settled down a bit and I think the only open question may be whether protocol errors should be exposed as a stop event or an exception event here:

Bandit.Telemetry.stop_span(span, %{}, %{error: error.message})

I'm open to either option, and am happy to take guidance from the telemetry folks on this

@mtrudel
Copy link
Owner

mtrudel commented May 24, 2024

Closing due to inactivity. As I mentioned above, I'm happy to hew telemetry for these cases based on whatever otel folks want. Once / if someone has a concrete proposal just open a ticket up and I'll do my best to make it happen

@mtrudel mtrudel closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants