Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle multiple request schemes in metrics #7871

Closed
Cali0707 opened this issue Apr 25, 2024 · 5 comments · Fixed by #7904
Closed

How to handle multiple request schemes in metrics #7871

Cali0707 opened this issue Apr 25, 2024 · 5 comments · Fixed by #7904
Assignees

Comments

@Cali0707
Copy link
Member

Problem
Sometimes, an event which is dispatched from the imc dispatcher might dispatch with mixed schemes. If there are one or more subscribers with https, then those events will be dispatched with https scheme. If there are one or more subscribers with http, then those events will be dispatched with http scheme. This can cause issues when it comes to recording metrics, because if we record each metric once for http and once for https then our event count will be wrong for the dispatcher (since some events will be recorded twice). Some ideas proposed by @maschmid to fix this are:

  1. Rather than have event_scheme="http" or event_scheme="https", tag the metrics with event_scheme_http=true/false and event_scheme_https=true/false.
  2. Do event_scheme="https" only if ALL subscribers are https, otherwise report http.

Persona:
Which persona is this feature for?
Administrators trying to observe the system

Exit Criteria
The metric counts are always correct, even in TLS permissive

Time Estimate (optional):
How many developer-days do you think this may take to resolve? 1

Additional context (optional)
Add any other context about the feature request here.

@Cali0707
Copy link
Member Author

My personal vote here is to go with option 1 from above, as that way:

  1. the event count will be correct
  2. I can create other aggregate views of information to see if e.g. https dispatches fail more frequently than http dispatches

@Cali0707
Copy link
Member Author

cc @pierDipi @creydr @Leo6Leo any thoughts on what approach we should take here? One of the options above, some other option?

@pierDipi
Copy link
Member

pierDipi commented Apr 29, 2024

To me the problem is not how we tag metrics but maybe how it is implemented, I don't think the proposed tags in 1 are idiomatic, also queries similar to 2

I can create other aggregate views of information to see if e.g. https dispatches fail more frequently than http dispatches

seem hard to express in promql as we couldn't easily aggregate using sum by (event_scheme)

Question: where is the metrics problem, is it on ingress or is it on egress ?
On ingress, when an event comes in event_scheme is based on the channel's ingress scheme, if it's on egress we need to count and emit metrics for each subscriber separately rather than as aggregate

@Cali0707
Copy link
Member Author

@pierDipi this metrics problem is on the egress for IMC. So, I guess the fix is to change how we count and emit metrics in the imc-dispatcher?

@pierDipi
Copy link
Member

pierDipi commented May 8, 2024

/assign

@Cali0707 Cali0707 linked a pull request May 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants