Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes early notifications after config reload (#2492) #3835

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Commits on May 24, 2024

  1. Fixes early notifications after config reload (prometheus#2492)

    Add an acceptance test that triggers a config reload and verifies
    that no early notification is occurring.
    
    One of the challenges is which time to use to check for a previous
    notification. The nflog captures about the time all notifications
    were sent. That conflicts with the ag.next timer that get's reset
    before the ag is being flushed. Delays and retries can make these
    two point in time be different enough for the integration tests to
    fail.
    
    I considered the following ways to fix it:
    
      1.) Modify the nflog.proto to capture the flush time in addition
          to the successful notification time.
      2.) In addition to hasFlushed capture the last flush time and pass
          it to the context like we do for Now.
      3.) Set hashFlushed and reset the timer after the flush has been
          done.
    
    I started with prometheus#3 as it seemeded to have the fewest downsides with
    things like drift. Based on comments this is no prometheus#1.
    
    needsUpdate is based on:
    prometheus#3074 (comment)
    
    Signed-off-by: Holger Hans Peter Freyther <holger@freyther.de>
    zecke committed May 24, 2024
    Configuration menu
    Copy the full SHA
    65e46e1 View commit details
    Browse the repository at this point in the history

Commits on May 26, 2024

  1. Avoid drift when reloading the configuration with firing alerts

    When creating an AggregationGroup consult the Stage about the
    next expected time to `Exec` and use it to set the initial timer.
    
    Update the acceptance test to match the new behavior and add a
    test to receive the updated group at group_interval timeout.
    
    Signed-off-by: Holger Hans Peter Freyther <holger@freyther.de>
    zecke committed May 26, 2024
    Configuration menu
    Copy the full SHA
    c649362 View commit details
    Browse the repository at this point in the history
  2. notify: Use the DispatchTime and now as base for the repeat

    Use the dispatch time instead of the last successful notification
    for the repeat interval. This avoids a drift and also keeps the
    same time during notifications.
    
    Signed-off-by: Holger Hans Peter Freyther <holger@freyther.de>
    zecke committed May 26, 2024
    Configuration menu
    Copy the full SHA
    369954b View commit details
    Browse the repository at this point in the history