-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One AlertmanagerConfig failing to sync, blocks all others #6532
Comments
It should be a bug in the operator then: the expectation is that invalid AlertmanagerConfig objects are rejected before generating the final config. |
Been exploring the Alertmanager side of things on the operator for the past month. I'll check this one out. |
In practice, the object with invalid reference should be detected (and rejected) here: prometheus-operator/pkg/alertmanager/operator.go Lines 1032 to 1048 in 2e82f89
|
Approach to solve this issue and Status :
Just to give you guys an outline on what I am doing and where I am with this. |
Is there an existing issue for this?
What happened?
Description
In our multi-tenant clusters, we have many users deploying their own
AlertmanagerConfig
objects in their namespaces, and they all get synced to a central Alertmanager.If in namespace we create a
AlertmanagerConfig
Object that for some reason fails to get synced to the managed Alertmanager, then the Operator will NOT ignore it and try and load other newly createdAlertmanagerConfig
Objects coming in from other namespaces.For example, we heavily work with Slack receivers. If for the URL secret, the value has any issues the Prometheus Operator fails to sync the
AlertmanagerConfig
reporting the following error:This error seems to block Prometheus Operator from syncing any other valid
AlertmanagerConfig
Objects.Steps to Reproduce
kubectl create ns test
AlertmanagerConfig
with a Slack Receiver\Route using the above created secretAlertmanagerConfig
Object getting synced to Alertmanager. Going to the Web Ui Status page also confirms that no new configuration got generated.Expected Result
The Operator should "ignore" or bypass the failing
AlertmanagerConfig
Object sync, and proceed with syncing other valid available resources.Actual Result
The Operator fails to sync any other
AlertmanagerConfig
Object as soon as a single one fails to get synced properly into Alertmanager.Prometheus Operator Version
Kubernetes Version
v1.28.5
Kubernetes Cluster Type
kubeadm
How did you deploy Prometheus-Operator?
helm chart:prometheus-community/kube-prometheus-stack
Manifests
No response
prometheus-operator log output
level=error ts=2024-04-19T07:26:08.458451375Z caller=klog.go:126 component=k8s_client_runtime func=ErrorDepth msg="sync \"monitoring-system/kps-alertmanager\" failed: provision alertmanager configuration: failed to generate Alertmanager configuration: AlertmanagerConfig test/slack-receiver: SlackConfig[0]: invalid URL \"'https://hooks.slack.com/services/XXX/XXX/XXX'\" in key \"url\" from secret \"slackapiurl-secret\": validate url from string failed for 'https://hooks.slack.com/services/XXX/XXX/XXX': parse \"'https://hooks.slack.com/services/XXX/XXX/XXX'\": first path segment in URL cannot contain colon"
Anything else?
No response
The text was updated successfully, but these errors were encountered: