Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra metric relabelings to scrape classes #6492

Merged

Conversation

sathieu
Copy link
Contributor

@sathieu sathieu commented Apr 9, 2024

Description

See #5947 (comment)

Inspired by #6379.

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.

Changelog entry

Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.

Add extra metric relabelings to scrape classes

@sathieu sathieu requested a review from a team as a code owner April 9, 2024 16:06
@sathieu sathieu marked this pull request as draft April 9, 2024 16:06
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from b10ca78 to 4e0d8f9 Compare April 9, 2024 16:13
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue for me is that the scrape class metrics relabeling happens after the scrape object's metrics relabeling. Other scrape class fields behave differently and can be overridden by scrape objects. I suppose that you have a specific requirement in mind but the inconsistency brings a UX problem.

@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from 4e0d8f9 to bb983f6 Compare April 9, 2024 19:55
@pull-request-size pull-request-size bot added size/L and removed size/M labels Apr 9, 2024
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from bb983f6 to 622d7c7 Compare April 9, 2024 20:47
@sathieu sathieu marked this pull request as ready for review April 9, 2024 20:56
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from 622d7c7 to 01933cb Compare April 10, 2024 06:29
@sathieu
Copy link
Contributor Author

sathieu commented Apr 10, 2024

@simonpasquier Yes. This is described in the doc, but if this is not enough we can use two configurations: metricRelabelingsBefore and metricRelabelingsAfter (I'm not good at naming)., or metricRelabelings and metricRelabelingsOverride, ....

@simonpasquier
Copy link
Contributor

Can you share more details about why you need scrape class relabeling to happen after service/pod monitor relabelings rather than before?

@sathieu
Copy link
Contributor Author

sathieu commented Apr 10, 2024

Yes.

As described in the issue, we have multi-tenant clusters, and each tenant is defined by a list of namespaces or a list of namespace regexps. We want the metrics relabeling to happen after the namespace label enforcement, to avoid a tenant injecting metrics into another tenant.

If scrape class metric relabelings happen before scrape resource metric relabelings, a malicious actor can easily change the tenant label.

@simonpasquier
Copy link
Contributor

As described #5947 (comment), we have multi-tenant clusters, and each tenant is defined by a list of namespaces or a list of namespace regexps. We want the metrics relabeling to happen after the namespace label enforcement, to avoid a tenant injecting metrics into another tenant.

To be clear: you're already setting .spec.enforcedNamespaceLabel: "namespace"?
If yes, how do you deal with recording/alerting rules? Do you expect to enforce the tenant label there too?

@sathieu
Copy link
Contributor Author

sathieu commented Apr 11, 2024

To be clear: you're already setting .spec.enforcedNamespaceLabel: "namespace"?

Yes.

If yes, how do you deal with recording/alerting rules? Do you expect to enforce the tenant label there too?

I forgot about those, I also need the tenant label for recording/alerting rules. Thanks 🙏 .

Maybe a new .spec.enforceMetricRelabelings in the prometheus CRD would be more appropriate (please note that I don't want excludedFromEnforcement to be applied here).

@simonpasquier
Copy link
Contributor

I forgot about those, I also need the tenant label for recording/alerting rules.

🤔 I don't see an obvious solution for rules since relabeling can't be applied here...

Maybe a new .spec.enforceMetricRelabelings in the prometheus CRD would be more appropriate (please note that I don't want excludedFromEnforcement to be applied here).

I suppose that the latter is because you deploy generic rules generating namespace-aware metrics/alerts and you still want the metrics/alerts to have the right tenant?

There might still be other use cases that need to skip "enforced" metric relabelings for some monitor/rule resources though.

@simonpasquier
Copy link
Contributor

Nevertheless this PR (with scrape class' relabeling happening before monitor's metric relabeling) is worth the effort even if it's not addressing @sathieu requirements.

@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from 01933cb to 8ce6306 Compare April 11, 2024 19:47
@sathieu
Copy link
Contributor Author

sathieu commented Apr 11, 2024

I've updated the PR to :

	// The Operator adds the scrape class metric relabelings defined here.
	// Then the Operator adds the target-specific metric relabelings defined in the scrape object.
	// Then the Operator adds namespace enforcement relabeling rule.

I'll handle my usecase in another PR.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sathieu, sorry for taking so long to give you an answer here, I saw the failing tests and assumed the PR was not ready. I took a deeper look today and I think the failing tests are flaky ones. I've re-triggered them to be sure... let's see

pkg/apis/monitoring/v1/prometheus_types.go Outdated Show resolved Hide resolved
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from 8ce6306 to a914e2d Compare May 8, 2024 21:06
@sathieu
Copy link
Contributor Author

sathieu commented May 8, 2024

Thanks @ArthurSens I've added your suggestion. I also had to rebase to resolve conflicts. Back to you!

@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from a914e2d to 850cfc2 Compare May 8, 2024 21:17
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from 850cfc2 to fff1d0d Compare May 9, 2024 09:25
ArthurSens
ArthurSens previously approved these changes May 9, 2024
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for tackling this!

I've made a few nitpick comments, but if you want to focus on something I'd say improving test readability is what could receive some love right now.

Approving the PR because this seems to work as expected

// MetricRelabelings configures the relabeling rules to apply to all samples before ingestion.
//
// The Operator adds the scrape class metric relabelings defined here.
// Then the Operator adds the target-specific metric relabelings defined in the scrape object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Then the Operator adds the target-specific metric relabelings defined in the scrape object.
// Then the Operator adds the target-specific metric relabelings defined in ServiceMonitors, PodMonitors, Probes and ScrapeConfigs.

//
// The Operator adds the scrape class metric relabelings defined here.
// Then the Operator adds the target-specific metric relabelings defined in the scrape object.
// Then the Operator adds namespace enforcement relabeling rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Then the Operator adds namespace enforcement relabeling rule.
// Then the Operator adds namespace enforcement relabeling rule, specified in '.spec.enforcedNamespaceLabel'.

@@ -8783,3 +8783,205 @@ func TestPodMonitorWithNonDefaultScrapeClassRelabelings(t *testing.T) {
require.NoError(t, err)
golden.Assert(t, string(cfg), "podMonitorObjectWithNonDefaultScrapeClassWithRelabelings.golden")
}

func TestServiceMonitorWithDefaultScrapeClassMetricRelabelings(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome tests!

We could improve code readability here if we reduce code duplication using table tests

Comment on lines 8821 to 8844
prometheus.Spec.ScrapeClasses = scrapeClasses
cg := mustNewConfigGenerator(t, prometheus)

cfg, err := cg.GenerateServerConfiguration(
context.Background(),
prometheus.Spec.EvaluationInterval,
prometheus.Spec.QueryLogFile,
prometheus.Spec.RuleSelector,
prometheus.Spec.Exemplars,
prometheus.Spec.TSDB,
prometheus.Spec.Alerting,
prometheus.Spec.RemoteRead,
map[string]*monitoringv1.ServiceMonitor{"monitor": serviceMonitor},
nil,
nil,
nil,
&assets.StoreBuilder{},
nil,
nil,
nil,
nil,
)
require.NoError(t, err)
golden.Assert(t, string(cfg), "serviceMonitorObjectWithDefaultScrapeClassWithMetricRelabelings.golden")
Copy link
Member

@ArthurSens ArthurSens May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean with code duplication is that this part is basically identical in all tests. We could have a single function with different inputs instead

testCases := []struct{
  scrapeClass monitoringv1.ScrapeClass
  podMonitors map[string]monitoringv1.Podmonitor // can be nil
  serviceMonitors map[string]monitoringv1.ServiceMonitor // can be nil
  probes map[string]monitoringv1.Probe // can be nil
  scrapeConfigs map[string]monitoringv1alpha1.ScrapeConfig // can be nil
  goldenFile string  
}

You can put the implementation of the test inside a loop and ensure that your input results are equal to the appropriate golden file

nicolastakashi
nicolastakashi previously approved these changes May 9, 2024
@sathieu sathieu dismissed stale reviews from nicolastakashi and ArthurSens via ddc3b9e May 22, 2024 20:22
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from fff1d0d to ddc3b9e Compare May 22, 2024 20:22
@sathieu
Copy link
Contributor Author

sathieu commented May 22, 2024

@nicolastakashi I've addressed your comments, rebased (there was a conflict because of 642d9cd) and force-pushed. Back to you 🏀 !

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
@sathieu sathieu force-pushed the scrapeclass-metric-relabeling branch from ddc3b9e to 555fa80 Compare May 22, 2024 20:27
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll let @ArthurSens and/or @nicolastakashi merge if they are happy.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks @sathieu !

@ArthurSens ArthurSens merged commit 37043cc into prometheus-operator:main May 23, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants