Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally separate metrics by group #2702

Closed
wants to merge 1 commit into from

Conversation

LeviHarrison
Copy link
Contributor

@LeviHarrison LeviHarrison commented Aug 11, 2022

What this PR does

The PR implements a feature which I'm calling "separating metrics," which breaks out some per-tenant metrics by the label group, which can be configured to be the value of any label on incoming series, by default team. Sort of like how the HA tracker works with the cluster label. The feature can be enabled per-tenant.

A few open questions:

  • The metrics I've added the group label to are listed below. The all fall under the general categories of usage/billing and errors. Any other ones we want or are there some of these that we don't want?
    • cortex_distributor_received_samples_total
    • cortex_distributor_received_exemplars_total
    • cortex_distributor_received_metadata_total
    • cortex_distributor_samples_in_total
    • cortex_distributor_exemplars_in_total,
    • cortex_distributor_metadata_in_total
    • cortex_discarded_samples_total
    • cortex_discarded_exemplars_total
    • cortex_discarded_metadata_total
    • cortex_ingester_ingested_samples_total
    • cortex_ingester_ingested_exemplars_total
    • cortex_ingester_ingested_samples_failures_total
    • cortex_ingester_active_series
  • Naming: This feature could probably also be called "aggregate metrics." I think "group" is a good, generic name to put on the metrics we exported, but should the default label we look for on series be something other than "team"?
  • Flags: Because --enable-separating-metrics and --separate-metrics-label can be used by either the distributor or the ingester, I didn't prefix them with component (ex. --distributor.enable-separating-metrics). Is that the proper way to do things?

Note: When the group label is empty, it gets dropped when scraped by Prometheus. So we don't have to worry about whether or not the label should be set even if the feature isn't enabled.

Which issue(s) this PR fixes or relates to

Fixes #2420

Checklist

Waiting to update tests and docs until naming/metrics/approach are confirmed.

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment on lines +478 to +496
if err := util.DeleteMatchingLabels(d.receivedSamples, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_samples_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.receivedExemplars, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_exemplars_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.receivedMetadata, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_received_metadata_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.incomingSamples, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_samples_in_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.incomingExemplars, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_exemplars_in_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.incomingMetadata, filter); err != nil {
level.Warn(d.log).Log("msg", "failed to remove cortex_distributor_metadata_in_total metric for user", "user", userID, "err", err)
}
if err := util.DeleteMatchingLabels(d.dedupedSamples, filter); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little bit ugly, but #2660 is on hold right now.

@@ -41,13 +41,14 @@ type seriesStripe struct {

mu sync.RWMutex
refs map[uint64][]seriesEntry
active int // Number of active entries in this stripe. Only decreased during purge or clear.
activeMatching []int // Number of active entries in this stripe matching each matcher of the configured Matchers.
active map[string]int // Number of active entries in this stripe per group. Only decreased during purge or clear.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's some performance impact of moving from the plain integer to a map. Because the vast majority of tenants will probably not be using this feature and the active series tracker is provisioned per-tenant, we could keep the old integer variable and only use the map if the tenant is using the separating metrics feature. Not sure if that's worth it thought.

Signed-off-by: Levi Harrison <git@leviharrison.dev>
@pracucci
Copy link
Collaborator

pracucci commented Oct 7, 2022

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@pstibrany
Copy link
Member

Closing, as #3439 implementing the same idea was merged, and this PR is abandoned.

@pstibrany pstibrany closed this Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subdivide some per-tenant metrics by another label
3 participants