New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit code whitelisting #809
Comments
Thanks for the suggestion. Technically possible of course, but I'm not sure how widely applicable this would be – is it common for to have scripts that return non-zero exit codes in success scenarios, and there is also no way to influence this either by passing parameters, by editing the scripts, or by using wrapper scripts with additional conditional logic? |
Hello @cuu508, here is one example I have in mind: when In such situations, it would be nice to have a warning state, rather than a failure. It would allow in the previous example to say “maybe you're backing up some folders or files that should be ignored”. Those situations are fine and can be investigated later (very different from a backup that failed to execute and might require immediate action). |
I have a similar use case, also backup related. I expect my backup script to run successfully once a day. However, if it runs more frequently (say due to manual triggers), it'll bail out without actually doing anything. I don't want to count this as success, but I don't want to alert on the failure either. So right now the only thing I can think to do is omit the "start" ping. If I want to retain "start", then I'd need a way to signal that a run is canceled. Using an allow-listed non-zero status code could work for that. |
@davidtorosyan a couple of questions, so I understand your use case:
Why does it bail out on manual triggers? Do manual and automatic triggers launch the job differently? Or does the backup job somehow recognize that "it's not the right time for me to run"?
If the job does what it is supposed to do (which may be "nothing" in some cases), why not count it as success?
At the time when you send the "start" signal, you do not yet know if the job will be cancelled / bail out, correct? Like, the script starts up, then recognizes that some condition is not met, and bails out? What is that condition? If you could detect the bail out condition near the start of the script, perhaps you could send the "start" signal only after it is clear the script will [attempt to] run fully? |
@cuu508 good questions! Let me try and answer with pseudocode: /* backup script, to be run daily */
// start for timing
http.post("hc.com/backup/start")
// expensive call, ideally happens after start
data = readData()
// the data only changes every 6 hours, so this will bail out if we run more frequently
// this is neither success nor failure, but a no-op.
// if we count this as success, then we won't be alerted if the data starts never changing (which is unexpected)
if ! data.changedSinceLastBackup {
exit
}
try {
data.backup()
http.post("hc.com/backup/success")
catch {
http.post("hc.com/backup/fail")
} I see an additional solution I didn't before - solving this with two health checks. One for the backup script, and one for successful backup itself. That way I'd have a signal for the backup script running (and succeeding even in the bail out case) and for an actual backup being done with a daily frequency. |
After thinking about it more, I think I might be doing to much with healthchecks. From what I can tell, healthchecks is best at making sure that a job is running with a given schedule (i.e. the backup job runs daily), not validating arbitrary conditions (i.e. the data that's backed up is the data I want). That said I still do have a need for the latter, so maybe what I'll do is something like this: /* append this to backup script described in previous comment */
backups = getBackups()
if backups.latest > ago(1d) {
http.post("hc.com/backups_healthy/success")
} else {
http.post("hc.com/backups_healthy/fail")
} |
Hello, would it be possible to have a feature that allows some non-0 exit codes to be whitelisted and considered as a success (or a warning for instance)?
I have some scripts that can end with a non-0 exit code that is not critical. It would be nice to be able to allow them and still consider execution as successful.
The text was updated successfully, but these errors were encountered: