Optionally separate metrics by group #2702

LeviHarrison · 2022-08-11T15:10:32Z

What this PR does

The PR implements a feature which I'm calling "separating metrics," which breaks out some per-tenant metrics by the label group, which can be configured to be the value of any label on incoming series, by default team. Sort of like how the HA tracker works with the cluster label. The feature can be enabled per-tenant.

A few open questions:

The metrics I've added the group label to are listed below. The all fall under the general categories of usage/billing and errors. Any other ones we want or are there some of these that we don't want?
- cortex_distributor_received_samples_total
- cortex_distributor_received_exemplars_total
- cortex_distributor_received_metadata_total
- cortex_distributor_samples_in_total
- cortex_distributor_exemplars_in_total,
- cortex_distributor_metadata_in_total
- cortex_discarded_samples_total
- cortex_discarded_exemplars_total
- cortex_discarded_metadata_total
- cortex_ingester_ingested_samples_total
- cortex_ingester_ingested_exemplars_total
- cortex_ingester_ingested_samples_failures_total
- cortex_ingester_active_series
Naming: This feature could probably also be called "aggregate metrics." I think "group" is a good, generic name to put on the metrics we exported, but should the default label we look for on series be something other than "team"?
Flags: Because --enable-separating-metrics and --separate-metrics-label can be used by either the distributor or the ingester, I didn't prefix them with component (ex. --distributor.enable-separating-metrics). Is that the proper way to do things?

Note: When the group label is empty, it gets dropped when scraped by Prometheus. So we don't have to worry about whether or not the label should be set even if the feature isn't enabled.

Which issue(s) this PR fixes or relates to

Fixes #2420

Checklist

Waiting to update tests and docs until naming/metrics/approach are confirmed.

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

LeviHarrison · 2022-08-11T15:11:53Z

pkg/distributor/distributor.go

+	if err := util.DeleteMatchingLabels(d.receivedSamples, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_samples_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.receivedExemplars, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_exemplars_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.receivedMetadata, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_metadata_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.incomingSamples, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_samples_in_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.incomingExemplars, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_exemplars_in_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.incomingMetadata, filter); err != nil {
+		level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_metadata_in_total metric for user", "user", userID, "err", err)
+	}
+	if err := util.DeleteMatchingLabels(d.dedupedSamples, filter); err != nil {


This is a little bit ugly, but #2660 is on hold right now.

LeviHarrison · 2022-08-11T15:15:43Z

pkg/ingester/activeseries/active_series.go

@@ -41,13 +41,14 @@ type seriesStripe struct {

 	mu             sync.RWMutex
 	refs           map[uint64][]seriesEntry
-	active         int   // Number of active entries in this stripe. Only decreased during purge or clear.
-	activeMatching []int // Number of active entries in this stripe matching each matcher of the configured Matchers.
+	active         map[string]int // Number of active entries in this stripe per group. Only decreased during purge or clear.


I assume there's some performance impact of moving from the plain integer to a map. Because the vast majority of tenants will probably not be using this feature and the active series tracker is provisioned per-tenant, we could keep the old integer variable and only use the map if the tenant is using the separating metrics feature. Not sure if that's worth it thought.

Signed-off-by: Levi Harrison <git@leviharrison.dev>

pracucci · 2022-10-07T09:52:14Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

pstibrany · 2023-01-11T16:32:14Z

Closing, as #3439 implementing the same idea was merged, and this PR is abandoned.

LeviHarrison commented Aug 11, 2022

View reviewed changes

Separating metrics feature

1d323da

Signed-off-by: Levi Harrison <git@leviharrison.dev>

LeviHarrison force-pushed the separate-metrics branch from 41f0060 to 1d323da Compare August 11, 2022 15:17

pracucci added the release/notified-changelog-cut label Oct 7, 2022

pracucci mentioned this pull request Nov 28, 2022

Add additional label to cortex_discarded_samples_total with the label value defined by a config flag #3439

Merged

3 tasks

pstibrany closed this Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally separate metrics by group #2702

Optionally separate metrics by group #2702

LeviHarrison commented Aug 11, 2022 •

edited by samjewell

LeviHarrison Aug 11, 2022

LeviHarrison Aug 11, 2022

pracucci commented Oct 7, 2022

pstibrany commented Jan 11, 2023

Optionally separate metrics by group #2702

Optionally separate metrics by group #2702

Conversation

LeviHarrison commented Aug 11, 2022 • edited by samjewell

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

LeviHarrison Aug 11, 2022

Choose a reason for hiding this comment

LeviHarrison Aug 11, 2022

Choose a reason for hiding this comment

pracucci commented Oct 7, 2022

pstibrany commented Jan 11, 2023

LeviHarrison commented Aug 11, 2022 •

edited by samjewell