New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feat: Add alert_relabel_configs
to the Prometheus and PrometheusAgent CRD's
#6450
[WIP] Feat: Add alert_relabel_configs
to the Prometheus and PrometheusAgent CRD's
#6450
Conversation
alert_relabel_configs
to the Prometheus and PrometheusAgent CRD'salert_relabel_configs
to the Prometheus and PrometheusAgent CRD's
Questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that we could also support relabelConfigs
for the discovery of Alertmanager endpoints.
Unit tests are enough but you need to account for Prometheus versions (e.g. it requires >= v2.51.0). |
Right, and a few other fields too. We discussed it with Arthur in the thread here. Should I open a separate PR for that, or should we deal with those here? |
Another PR is fine but you can already file issues so we don't forget. |
I've already added the relable config field locally, I'll create another branch address the validation and make a PR for that too. |
The test has been updated to contain sub-tests for (two) different versions. |
0918e51
to
7660ea1
Compare
I have the removed the commit which changes the location of |
7660ea1
to
4b9fd59
Compare
Not sure why the tests are failing, I ran |
You'll need to rebase because #6467 has merged :) |
9774807
to
1358da1
Compare
1358da1
to
431e754
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there :)
pkg/prometheus/store.go
Outdated
} | ||
if err := store.AddSigV4(ctx, namespace, am.Sigv4, fmt.Sprintf("alertmanager/auth/%d", i)); err != nil { | ||
return fmt.Errorf("alertmanager %d: %w", i, err) | ||
if err := store.AddBasicAuth(ctx, namespace, am.BasicAuth); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this change relates to the rest of the PR...?
Probably by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I probably mixed up the branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing seems to have actually changed though, is it fine to keep this in? I squashed all the commits I had made, so IDK if I can revert it. Sorry!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
Earlier, all alerts were sent to all alert managers configured under the Prometheus instance. The grouping, inhibition and silencing was done at the AM level alone. But at scale, this behavior overwhelms the AMs, though it is already possible to perform relabelings on the alerts "globally", the need for a way to do it per-alertmanager cluster came about.
For this, relabeling per alert manager config was introduced about a month ago in Prometheus v2.51 prometheus/prometheus#12551
By introducing relabeling at this level, it enables us to determine which labels are sent to which sets of Alertmanagers. That is, it contains a bunch of Alertmanager IPs to fire against and by defining a relabel configs here we are able to specify which alerts are sent to which sets of these AMs.
Closes #6445
Type of change
What type of changes does your code introduce to the Prometheus operator? Put an
x
in the box that apply.CHANGE
(fix or feature that would cause existing functionality to not work as expected)FEATURE
(non-breaking change which adds functionality)BUGFIX
(non-breaking change which fixes an issue)ENHANCEMENT
(non-breaking change which improves existing functionality)NONE
(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)Verification
Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.
PR is WIP, tests are being written.
Changelog entry
Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.