Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra_stats{name=xxx} not following prometheus naming best practices #83

Open
PedroMSantosD opened this issue Jun 5, 2020 · 7 comments

Comments

@PedroMSantosD
Copy link

Hi,

We have implemented your exporter on our cassandra infrastructure, but it is bringing down Prometheus due to the large memory footprint required by the "name" label.

Prometheus documentation shows that the current implementation of this exporter does not follow prometheus naming conventions , where you should have each metric representing "something", as your current implementation does with label "name".

Is it possile to replace the metric name cassandra_stats in favour of something more naming-compliant, i.e. cassandra_%yourCurrentNameLael%_units

The metric name specifies the general feature of a system that is measured (e.g. http_requests_total - the total number of HTTP requests received)....

Labels enable Prometheus's dimensional data model: any given combination of labels for the same metric name identifies a particular dimensional instantiation of that metric
....

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values

Screen Shot 2020-06-02 at 08 28 47

As a reference, a label with more than 8 unique values is considered "highly" cardinal, causing high memory needs, and slow performance when trying to query it.
Prometheus supports millions of metric Names , but it is highly sensible to cardinality.

Thanks in advance,

@geobeau
Copy link
Contributor

geobeau commented Jun 5, 2020

You can check https://github.com/criteo/cassandra_exporter#why-not-make-more-use-of-labels-be-more-prometheus-way-
The main reason is that Cassandra export its own metric format (that is path based) through JMX and the exporter is trying to do the minimum work to make it available (extract some labels, convert the value format to float64). Converting to Prometheus best practices would require special parsing to transform path based metrics to label based metrics, which can have a lot of edge cases

@geobeau
Copy link
Contributor

geobeau commented Jun 5, 2020

What do you mean by bringing down your Prometheus ?

One alternative is to use relabeling in the scrape config to replace __name__ by name (and replace special chars to _

@PedroMSantosD
Copy link
Author

Hi, thanks for your prompt reply.

By 'bringing down' I mean the cardinality of the label "name" in the metric cassandra_stats brings my node out of memory.

That being said, relabeling on prometheus causes the memory consumption to become worse, as the HEAD must contain the scraped resources plus the relabeled ones, as shown on the performance graph after enforcing the relabelling of the metrics:
Screen Shot 2020-06-08 at 09 20 35

It is True that the series persisted to disk will only be the ones left after relabeling; but the memory issue remains.

Is it not feasible to fix the "cassandra_stats{name='.*'} with the appropiate "cassandra_compliant_metric_name_unit{other_labels='...'} on exporter code?

Thanks in advance,

@geobeau
Copy link
Contributor

geobeau commented Jun 8, 2020

Are you sure your memory issue is not just because the exporter expose a lot of metrics? It's possible to blacklist some high cardinality metrics to save some memory in the configuration of the exporter.

I can try to mesure the difference in memory usage between 1 metrics with 5000 labels and 5000 metric but you may have to wait a bit.

@PedroMSantosD
Copy link
Author

Hi,
The graph on the former post, shows memory spike on prometheus upon implementation of relabeling rules. Prior to that, this graph shows the effect of the implementation of the exporter on production infrastructure:
Screen Shot 2020-06-09 at 08 56 01

This spike and the triggering of alerts is what has raised the issue.
Hope it helps?

@geobeau
Copy link
Contributor

geobeau commented Jul 2, 2020

Hello, sorry for the delay. I tried locally the difference between 50 metrics with 1000 series each vs 5000 metrics with 10 series each and didn't notice any particular difference in memory usage.

I think your increase in memory is expected given the additional number of metrics generated by the exporter. The memory usage is given by the total number of series independent of the cardinality of each metrics.

My advice is to increase your memory limit or blacklist metrics using the blacklist feature of the exporter.

@pnathan
Copy link

pnathan commented Jul 22, 2022

Hi,

This presents material issues with hitting series limits for cassandra_stats. Example from my logs (gently redacted):

caller=dedupe.go:112 component=remote level=error remote_name=mimir url=https://prom/api/v1/push msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: user=anonymous: per-metric series limit of 2000 exceeded, please contact administrator to raise it (per-ingester local limit: 1500) for series {__name__=\"cassandra_stats\", app_kubernetes_io_instance=\"c\", app_kubernetes_io_managed_by=\"Helm\", app_kubernetes_io_name=\"cassandra\", ciq_cluster=\"management\", cluster=\"cassandra\", controller_revision_hash=\"cassie-cassandra-7dffd6c9bd\", datacenter=\"datacenter1\", helm_sh_chart=\"cassandra-9.1.19\", instance=\"10.1.1.1:8080\", job=\"kubernetes-pods\", name=\"org:apache:cassandra:metrics:clientrequest:write-node_local:mutationsizehistogram:75thpercentile\", namespace=\"cassandra\", pod=\"c-cassandra-3\", statefulset_kubernetes_io_pod_name=\"c-cassandra-3\"}"

I would encourage the authors of the exporter to properly adhere to Prometheus guidance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants