New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics about how opentelemetry collector is used #2829
Comments
Is there a reason you cannot use the collector's own metrics to obtain information about the components it runs? Similarly, you should be able to utilize k8s state metrics or the k8s cluster receiver to answer the second question. I'm trying to better understand the value of having this from the operator as opposed to existing sources. |
In general, this feature request proposes to define a set of operator metrics that will enable cluster administrators better understand how the collector is used. For instance which collector components are used, how the collector is deployed, whether the prometheus/ta is enabled and so on. |
Seems like a good SIG discussion topic :) |
we discussed this at the SIG meeting, @rubenvp8510 is going to get together a list of metrics and why we want to track each of them. Ruben will also put this behind a feature gate initially. |
This is the list of metrics I want to expose at this first version:
For the receivers/exporters/processor/extension metrics I'm using a label type, which will be equal to the component name for example
This is useful because we want to know for the operants handled by this operator what components those collectors are using. For the |
I wonder if instead of a metric for each component type, we could simply emit a single gauge Going off of the Prometheus recommendations it feels like we should have a single metric with each of these as labels to then be aggregated later. i.e. you could run a query like Though maybe this would be better as two metrics: one for collector metadata and one for the components:
Either way, I think these are some constraints for the metrics:
|
I'll do the changes for this, the only thing I'm not sure we need to include is the collector_name. I would be worried about the cardinality of that label, esentially what I want with this metrics are a summary of the status of the cluster. |
I also not sure about using the same metric, as my understanding is that Prometheus considers each unique combination of labels and label value as a different time series, and if we use the same metric and label each combination of receivers/exporters/processors etc.. This will grow to much! I think we should preserve the metrics in this way
This will give us a good insights of what is deployed on the cluster without the produce a high cardinality time series. This is IMHO a good tradeoff. Not sure if the namespace/collector name is useful. at least not for my use case. |
I think we should definitely include the namespace/collector name as that would be really useful for someone trying to determine when and where a new receiver came online. I think it's fine to start with these as separate metrics, we can always change this in the future. |
I agree that it would be useful, but still have the label cardinality concern. I would prefer to have this first version with separated metrics, and then we can move forward and add new things if new use cases required it. |
i already know that the SREs and Ops folks at my co would want this granularity :) Cardinality add would only be the amounts of collector pools you run (because a collector pool can only run in a single namespace) so it's really not that much more. |
Component(s)
collector
Is your feature request related to a problem? Please describe.
I want to expose metrics about how the opentelemetry collector is used.
For this concrete feature I wan to know:
Describe the solution you'd like
Exposed those metrics at the operator prometheus endpoint.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: