New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.execd): allow failures on cmd start #14244
Comments
From the code, we will continuously try to restart, except on errors from running the cmd start. If telegraf cannot start an input plugin, or in this case, start the execd that you want us to, then telegraf will fail. This is the expected behavior in general, as it makes little sense to try to continue run if we cannot start a plugin that you expect to provide data. However, we have other FR to enable settings on a per-plugin basis that would allow ignoring errors on start up, and we can do that here as well. |
Trying to reproduce the issue really gives me a headache. It seems like the startup only fails in cases where the OS hard-terminates the executed process. Such events are severe like out-of-memory or maybe segfaults. I don't think we should handle those cases as the kernel rightfully terminated the process, maybe even in an uncontrolled way (as in the OOM case). |
Yeah, it would most likely be memory issues on the system to cause this situation. Maybe this should be an option to just shutdown telegraf with an error state if execd fails? This would at least make it more obvious that something failed and for systemd hosts the service would get restarted if configured to do so. |
I downloaded it and added the config option. It seems like this only stops the execd plugin not telegraf itself, right? |
Exactly. Once Telegraf started it is impossible to stop it completely as it is now. |
@ajw1980 are you good with the fix? |
That option doesn't really address this issue. The execd input will already not relaunch the command if there is some sort of system (out of memory) error. |
Relevant telegraf.conf
Logs from Telegraf
System info
telegraf 1.21.3 fedora
Docker
No response
Steps to reproduce
Create an execd input.
Have the system fail in a way that processes don't start properly (out of memory)
Expected behavior
execd process should always be restarted
Actual behavior
execd process was not restarted.
Additional info
In certain instances where a system problem causes processes to not start, an execd plugin process will not get restarted. In this case the machine ran out of memory and the execd process stopped. It would seem if the process starts and exits telegraf will restart it, but if telegraf fails to even start the process it will no longer be restarted.
The text was updated successfully, but these errors were encountered: