-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Timestamps to Health Status #16972
Comments
This would also help to stablize |
That sentence made me realise that |
Hi @mkieweg, I discussed this enhancement in today's contributors meeting, and there were no objections. It's a green light for implementation. Please let me know if you have the bandwidth to take this up, otherwise, I am happy to proceed with it. |
Hi @svghadi, happy to hear that. I'll take on the implementation, I've allocated time for it next week :) |
Cool. Here is something I was trying out last week for |
While I am super supportive of this being added, I have an edge case that I think falls into this category but is not covered by just this addition. Within Argo CD, the sync operations are completely separated from the application health. However, in certain scenarios, you do want to know if the current state of the health check is the outcome of a previous sync operation. With the proposal, this would become possible by comparing the To also cover this, and while taking into account the comments[1][2], I wonder if it would be an idea to introduce a "heartbeat" kind of timestamp that updates at most every 30 seconds (to not cause any API pressure) to indicate the "freshness" of the observation even without it transitioning. |
* add lastTransitionTime to health status Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * address first feedback Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * set transition time if health status is unknown Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * extend health improvement tests Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * add apoplication controller test Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * use require for NoError Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * more extensive tests for health state changes Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * Apply suggestions from code review Co-authored-by: Blake Pettersson <blake.pettersson@gmail.com> Signed-off-by: Manuel Kieweg <2939765+mkieweg@users.noreply.github.com> * Code review suggestions Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * remove obsolete assert Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * Change LastTransitionTime field to pointer type Due to implementation limitations, setting LastTransitionTime at the resource level is challenging. Converting it to a pointer type allows it to be skipped at the resource level and prevents it from appearing in .status.resources of the Application CR. Additionally, it doesn’t provide much value or have a known use case right now. Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Resolve rebase conflicts Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Address review comment Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Trigger CI Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> --------- Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> Signed-off-by: Manuel Kieweg <2939765+mkieweg@users.noreply.github.com> Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> Co-authored-by: Blake Pettersson <blake.pettersson@gmail.com> Co-authored-by: Siddhesh Ghadi <sghadi1203@gmail.com>
* add lastTransitionTime to health status Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * address first feedback Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * set transition time if health status is unknown Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * extend health improvement tests Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * add apoplication controller test Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * use require for NoError Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * more extensive tests for health state changes Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * Apply suggestions from code review Co-authored-by: Blake Pettersson <blake.pettersson@gmail.com> Signed-off-by: Manuel Kieweg <2939765+mkieweg@users.noreply.github.com> * Code review suggestions Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * remove obsolete assert Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> * Change LastTransitionTime field to pointer type Due to implementation limitations, setting LastTransitionTime at the resource level is challenging. Converting it to a pointer type allows it to be skipped at the resource level and prevents it from appearing in .status.resources of the Application CR. Additionally, it doesn’t provide much value or have a known use case right now. Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Resolve rebase conflicts Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Address review comment Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> * Trigger CI Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> --------- Signed-off-by: Manuel Kieweg <mail@manuelkieweg.de> Signed-off-by: Manuel Kieweg <2939765+mkieweg@users.noreply.github.com> Signed-off-by: Siddhesh Ghadi <sghadi1203@gmail.com> Co-authored-by: Blake Pettersson <blake.pettersson@gmail.com> Co-authored-by: Siddhesh Ghadi <sghadi1203@gmail.com>
Summary
Currently there is no way to notify on a certain health status being active longer than a set period of time. I propose to add a
lastUpdateTime
timestamp toapp.status.health
indicating how long the current status is active for.Motivation
Since an Application with an erroneous configuration might be
Progressing
forever, I want to be able to leverage ArgoCD notifications to solve this issue. Currently the only workaround I'm aware (Slack discussion) of is to cross referenceapp.status.operationState.startedAt
andapp.status.health.status
like this:which is prone to over-alerting as it triggers any instance of
Progressing
within this time span. So it's impossible to distinguish bad timing of a pod eviction from an app actually stuck onProgressing
Proposal
The
lastUpdateTime
field should be added to the respective struct and be set to the current time when the status changes.I'm happy to take on the implementation.
The text was updated successfully, but these errors were encountered: