New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assistance Needed with Prometheus and Alertmanager Configuration #3781
Comments
Hi there, generally the issue tracking is not the right place for questions like this. Please consider taking it to https://groups.google.com/g/prometheus-users or similar forums. Very likely the issue that you are facing here is staleness. If you only scrape every hour your metric will be stale (and thus non existent) for 55 minutes of every hour.
So that basically means your configuration is not supported. Please close this issue as it is not a bug or feature request for alertmanager. |
Thank you for your prompt response and guidance on addressing the metric staleness issue. Regarding your suggestion to use square brackets for the recording metric and alerting rule (the link that you shared), I confirm that I have already implemented this approach. However, the main challenge persists with the discrepancy in the number of alerts generated by Prometheus compared to those displayed in Alertmanager. (e.g. max_over_time(metric[1h])) To illustrate, when observing Prometheus, I may observe approximately 25,000 alerts triggered within a given period. However, when reviewing the corresponding alerts in Alertmanager, the count often deviates significantly, displaying figures such as 10,000 or 18,000, rather than the expected 25,000. This inconsistency poses a significant challenge in our alert management process, leading to confusion and potentially overlooking critical alerts. I would greatly appreciate any further insights or recommendations you may have to address this issue and ensure alignment between Prometheus and Alertmanager in terms of the number of alerts generated and displayed. |
As @TheMeier said https://groups.google.com/g/prometheus-users is the best place to ask such questions. Could you please close this issue? |
I am encountering challenges with configuring Prometheus and Alertmanager for my application's alarm system. Below are the configurations I am currently using:
prometheus.yml:
Scrape Interval: 1h
rules.yml:
alertmanager.yml:
Issues:
Inconsistent Alerting: The similarity in scrape interval and recording rule evaluation interval (both set to 1 hour) leads to instances where Prometheus scrapes data before the recording rule evaluation. Consequently, during the recording rule evaluation, there may be no value in the metric, resulting in the recording rule failing to trigger an alert despite the condition being satisfied.
Discrepancy in Firing Alerts: The number of firing alerts in Prometheus varies significantly from the number of alerts received by Alertmanager, causing inconsistency and confusion in alert handling.
Uncertainty in Alert Evaluation Timing: The alerting rule seems to be evaluated inconsistently, sometimes triggering alerts shortly after service restart, while other times with delays beyond the expected 4-hour interval.
Request for Assistance:
I am seeking guidance on configuring Prometheus and Alertmanager to achieve the following:
Thanks in advance.
The text was updated successfully, but these errors were encountered: