Enable customization of prometheus metrics labels created by rook operator #9618

svolland-csgroup · 2022-01-20T17:00:02Z

Is this a bug report or feature request?

Feature Request

What should the feature do:
The operator actually supports an undocumented (?) monitoring label rook.io/managedBy that is used to add a managedBy label to the prometheus metrics.

spec:
  labels:
    monitoring:
      rook.io/managedBy: some-value

The operator adds a relabeling to the ServiceMonitor for this specific label, this is implemented in function applyMonitoringLabels in mgr.go.
The new feature shall enable to define a list of fully customizable labels to add to the metrics (instead of a single fixed-named one).

This could also be implemented by leveraging the targetLabels feature of the ServiceMonitor. In this case the labels already specified in the clusterspec for the mgr could also be applied to the Service spec of the mgr daemon (they are actually only applied to the mgr deployment/pods). Then one could pick theses labels by configuring the list of targetLabels.

What is use case behind this feature:
The use case is the need to add extra labels to prometheus metrics. This is particularly useful when the operator is managing more than one cluster, to distinguish the metrics by using more functional label meanings than the currently available labels that might serve this purpose (e.g. namespace, instance or pod) .

The text was updated successfully, but these errors were encountered:

travisn · 2022-01-20T17:40:58Z

@svolland-csgroup The monitoring labels are currently only applied if the label is rook.io/mangedBy, but I don't recall why that was so strict with #8355. Would it solve the issue if we just allow any label to be applied instead of only that specific label? That makes sense to me for most flexibility, unless you're saying another change is also needed for targetLabgels.

@anmolsachan Do you recall why only rook.io/managedBy was allowed?

svolland-csgroup · 2022-01-21T09:11:33Z

Applying any label would solve the issue and only require little code changes.

Leveraging the targetLabels feature would be a nicer solution because it uses the prometheus operator functionality that would generate appropriate configuration (instead of directly generating custom prometheus relabeling configurations), but it requires more changes:

the monitoring labels must be added to the mgr service resource (targetLabels exploits labels on the service)
then, depending on the degree of customization wanted:
- either generate a targetLabels in the mgr ServiceMonitor with all label keys specified in the labels.monitoring config
- or only add a configurable subset of these label keys (that would require a specific configuration section in the cluster spec, or using a specific prefix for these labels e.g. metrics/my-key=my-value)

Care must be taken that the labels specified in labels.monitoring do not overwrite those already existing in the service (i.e. app, ceph_daemon_id, rook_cluster), see the example below.

Configuration:

spec:
  labels:
    monitoring:
      my-monitoring-label: my-value

generated mgr service:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: rook-ceph-mgr
    ceph_daemon_id: a
    rook_cluster: my-cluster
    my-monitoring-label: my-value
  name: rook-ceph-mgr
[...]

generated ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rook-ceph-mgr
spec:
[...]
  targetLabels:
    - my-monitoring-label

travisn · 2022-01-21T21:42:11Z

How will it help the ServiceMonitor if the targetLabels support a custom label? Don't the labels already on the rook-ceph-mgr service already uniquely identify the service that prometheus needs to scrape? Or what is different about the custom targetLabels?

travisn · 2022-01-21T21:43:13Z

Ah, it must be as you said in the initial post that the targetLabels will help in this case:
"This is particularly useful when the operator is managing more than one cluster, to distinguish the metrics by using more functional label meanings than the currently available labels that might serve this purpose (e.g. namespace, instance or pod)"

anmolsachan · 2022-01-25T14:35:05Z

@svolland-csgroup The monitoring labels are currently only applied if the label is rook.io/mangedBy, but I don't recall why that was so strict with #8355. Would it solve the issue if we just allow any label to be applied instead of only that specific label? That makes sense to me for most flexibility, unless you're saying another change is also needed for targetLabgels.

@anmolsachan Do you recall why only rook.io/managedBy was allowed?

Hey Travis! Apologies for the late reply on this.

There was no strictness for using rook.io/managedBy, but there was only one use case at that moment.

Use case: "As a Kubernetes Admin I should be able to identify the metrics coming from the cephclusters uniquely, when on the same K8s cluster I create multiple cephclusters at the same time".

Adding an extra managedBy label in the Service Monitor adds a label to every metric exported by the Prometheus exporter. Also, it was optional because the above-stated use case might not be true.
In our case, we were setting the mangedBy value as the name of the higher managing storageCluster CR name to segregate cluster metrics and built relationships in promQL, but it's up to the user to choose the value.
It also covered cephclusters running in regular and external mode side by side.

github-actions · 2022-03-26T20:01:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

BlaineEXE · 2022-03-28T18:39:02Z

I believe this is made irrelevant by #9837. Closing this for now. We have the option to reopen if it is still relevant with the changed method of deploying prometheus metrics rules.

holmesb · 2023-04-12T11:39:42Z

This issue is still relevent @BlaineEXE, in pretty much ever widely used helm chart, it is possible to define serviceMonitor relabelings and metricsRelabelings. Need to add these under monitoring in the helm values.yaml, and then rook needs to to add these to the serviceMonitor resource.

Another use case: without relabeling, the instance label in ceph metrics in many cases is a non-identifiable IP address, instead of a meaningful host name. So eg there's no association between an OSD and the node it is on. If we could relabel, could rewrite instance from a metadata label containing the hostname. See api doc for serviceMonitor.spec.endpoints where relabelings & metricsRelabelings are.

BlaineEXE · 2023-04-12T18:03:35Z

@holmesb I would consider this a new feature request to the current helm chart.

svolland-csgroup added the feature label Jan 20, 2022

travisn mentioned this issue Jan 21, 2022

monitoring: customize prometheus rule alerts #9503

Closed

10 tasks

github-actions bot added the wontfix label Mar 26, 2022

BlaineEXE closed this as completed Mar 28, 2022

BlaineEXE changed the title ~~Enable further customization of prometheus metrics labels~~ Enable customization of prometheus metrics labels created by rook operator Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable customization of prometheus metrics labels created by rook operator #9618

Enable customization of prometheus metrics labels created by rook operator #9618

svolland-csgroup commented Jan 20, 2022 •

edited

travisn commented Jan 20, 2022

svolland-csgroup commented Jan 21, 2022 •

edited

travisn commented Jan 21, 2022

travisn commented Jan 21, 2022

anmolsachan commented Jan 25, 2022 •

edited

github-actions bot commented Mar 26, 2022

BlaineEXE commented Mar 28, 2022

holmesb commented Apr 12, 2023 •

edited

BlaineEXE commented Apr 12, 2023

Enable customization of prometheus metrics labels created by rook operator #9618

Enable customization of prometheus metrics labels created by rook operator #9618

Comments

svolland-csgroup commented Jan 20, 2022 • edited

travisn commented Jan 20, 2022

svolland-csgroup commented Jan 21, 2022 • edited

travisn commented Jan 21, 2022

travisn commented Jan 21, 2022

anmolsachan commented Jan 25, 2022 • edited

github-actions bot commented Mar 26, 2022

BlaineEXE commented Mar 28, 2022

holmesb commented Apr 12, 2023 • edited

BlaineEXE commented Apr 12, 2023

svolland-csgroup commented Jan 20, 2022 •

edited

svolland-csgroup commented Jan 21, 2022 •

edited

anmolsachan commented Jan 25, 2022 •

edited

holmesb commented Apr 12, 2023 •

edited