Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

a-thaler · 2024-04-12T16:35:59Z

Description
The MetricPipeline supports already an input type runtime which emits metrics around the container and pod resource consumption. What is missing are further typical metrics:

from the apiserver about configured resource limits
from the apiserver around the state of workloads
from the kubelet statistics of the volumes
from the kubelet statistics of the nodes
mainly the typical metrics resulting from the kubletstatsreceiver and the k8sclusterreceiver

Having these metrics available, basic troubleshooting for kubernetes workload including alerting can be fullfiled.

Goal
Provide a way to collect a typical set of metrics for basic workload troubleshooting (comparable to the metrics used by the dashboards provided by the kube-prometheus-stack)

Criterias

Typical metrics are collectable which are needed to troubleshoot
- Pod compute resource
- Node resource usage
- Volume resource usage
- Health of workloads (deployment stuck for example)
Namespace specific metrics can be enabled per namespace (probably independent from non-namespaces resources)
Node and Volume related metrics can be enabled optional to workload related metrics

Actions

Preparations
- Build understanding of the available receivers and make an API proposal (Metric API proposal to cover typical Kubernetes metrics #1001)
- Come up with a concept on how to run the k8sclusterreceiver which does not fit into the current architectural setup (PoC for operating the k8sclusterreceiver #1003)
Implementation

Reasons
The current feature set is a good start but are missing apiserver related details like limits to get a complete picture for troubleshooting and defining relevant alerts. Furthermore typical workload health related metrics are missing from the apiserver. Also volumes and node statistics are important in daily operations.

Attachments

Release Notes

The text was updated successfully, but these errors were encountered:

a-thaler added kind/feature Categorizes issue or PR as related to a new feature. area/metrics MetricPipeline labels Apr 12, 2024

a-thaler mentioned this issue Apr 22, 2024

Metric API proposal to cover typical Kubernetes metrics #1001

Open

a-thaler changed the title ~~Metric inputs to cover typical workload operations~~ Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

a-thaler commented Apr 12, 2024 •

edited

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

Typical kubernetes workload metrics as telemetry input to enable dashboarding and alerting #972

Comments

a-thaler commented Apr 12, 2024 • edited

a-thaler commented Apr 12, 2024 •

edited