Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelhttp: Record metrics on timed out requests #4542

Closed
pellared opened this issue Nov 9, 2023 · 7 comments
Closed

otelhttp: Record metrics on timed out requests #4542

pellared opened this issue Nov 9, 2023 · 7 comments
Assignees
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed instrumentation: otelhttp

Comments

@pellared
Copy link
Member

pellared commented Nov 9, 2023

On an HTTP client implementation, I have a meter using the request context and I would like to keep track of HTTP timeouts, however, the metrics are ignored because of this timeout.

Originally posted by @ravanscafi in open-telemetry/opentelemetry-go#3618 (comment)

We should do something like here:

func withoutCancel(parent context.Context) context.Context {
if parent == nil {
panic("cannot create context from nil parent")
}
return withoutCancelCtx{parent}
}
type withoutCancelCtx struct {
c context.Context
}
func (withoutCancelCtx) Deadline() (deadline time.Time, ok bool) {
return
}
func (withoutCancelCtx) Done() <-chan struct{} {
return nil
}
func (withoutCancelCtx) Err() error {
return nil
}
func (w withoutCancelCtx) Value(key any) any {
return w.c.Value(key)
}
func (w withoutCancelCtx) String() string {
return "withoutCancel"
}

@pellared pellared added bug Something isn't working area: instrumentation Related to an instrumentation package instrumentation: otelhttp good first issue Good for newcomers help wanted Extra attention is needed labels Nov 9, 2023
@carrbs
Copy link
Contributor

carrbs commented Nov 14, 2023

Would it be possible to include some additional context for someone who is looking to pick up a "good first issue"? Potentially a way to reproduce the bug/scenario?

It appears we can likely copy (or generalize to share between the two) the withoutCancel flow and call it (here) before calling the next middleware/handler in the chain, so it doesn't cancel the context prematurely, something like:

// instrumentation/net/http/otelhttp/handler.go:232
ctxWithoutCancel := withoutCancel(ctx)
next.ServeHTTP(w, r.WithContext(ctxWithoutCancel))

But I'm not certain that's the requested fix 🤔... (likely bc most of this is very new to me 😁).

@pellared
Copy link
Member Author

pellared commented Nov 15, 2023

Would it be possible to include some additional context for someone who is looking to pick up a "good first issue"? Potentially a way to reproduce the bug/scenario?

I see two ways

First one, to cancel using an HTTP Client. See: https://pkg.go.dev/net/http#Request.Context

For incoming server requests, the context is canceled when the client's connection closes,

Second one, to cancel through HTTP Server itself. See Server.BaseContext : https://pkg.go.dev/net/http#Server

BaseContext optionally specifies a function that returns the base context for incoming requests on this server.

It appears we can likely copy (or generalize to share between the two) the withoutCancel flow and call it (here) before calling the next middleware/handler in the chain, so it doesn't cancel the context prematurely, something like:

The derived context created via withoutCancel should be only used for recording measurements here:

h.counters[RequestContentLength].Add(ctx, bw.read, o)
h.counters[ResponseContentLength].Add(ctx, rww.written, o)
// Use floating point division here for higher precision (instead of Millisecond method).
elapsedTime := float64(time.Since(requestStartTime)) / float64(time.Millisecond)
h.valueRecorders[ServerLatency].Record(ctx, elapsedTime, o)

Therefore, I would rather create a context using withoutCancel somewhere here:

@shuwpan
Copy link

shuwpan commented Nov 16, 2023

Hi @carrbs and @pellared,

I am new to go repo, I saw this is a "good first issue" so I came in.
Let me try to summarize my understanding for this issue.

  1. We want to be able to count how many http calls are timed out and put that number in metrics.
  2. I assume we should implement something similar to the code snippet shown in the description area.

Here are my questions

  1. How to reproduce the issue ?
    a. Write a sample http server + simple http client, instrument both
    b. then somehow make the call timeout
  2. Where should the fixing code go ? I saw some suggestions in comment area. (but no having much brackgroud, not quite getting it)

Thanks,
Shuwen

@pellared
Copy link
Member Author

pellared commented Nov 17, 2023

This issue is good for repository "newcomers", but not for Go novices.

@shuwpan
Copy link

shuwpan commented Nov 17, 2023

@pellared , I mean go repo, not go itself

@carrbs
Copy link
Contributor

carrbs commented Nov 27, 2023

Hi @pellared, I added a WIP PR to take a crack at this. Would it be good to assign this to me so there isn't duplicative effort?

@carrbs
Copy link
Contributor

carrbs commented Dec 27, 2023

@pellared, I think this should be closed since we'll be handling this case lower level now?

@pellared pellared closed this as completed Jan 2, 2024
@pellared pellared closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed instrumentation: otelhttp
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants