forked from fluent/fluentd
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Block excessive gracefulReload requests
Closes: fluent#3341 In the previous versions, /api/config.gracefulReload call doesn't restrict excessive API calls. It causes the following error when already gracefulReload is executing. Worker 0 finished unexpectedly with signal SIGKILL This commit mitigates such a situation by restricting a API call. (it gives an some interval and it is customizable in system configuration - blocking_reload_interval parameter) NOTE: Ideally it should wait and detects graceful reload finish, but there is no easy way to synchronize internal state between ServerModule(RPC::Server#mount_proc) and WorkerModule (reload_config). Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
- Loading branch information
Showing
4 changed files
with
109 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3194d93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kenhys, a few comments/questions about this change:
3194d93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a case that affects, but worker should not be killed unexpectedly.
Yes.
Maybe yes, but I've chosen a more simple approach.
Hmm, even though the worker is unexpectedly killed, as a new worker loads new configuration, the current behavior may be enough π€
3194d93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that a simple approach might just be too simple and can lead to issues where latest config is not picked up because reload was blocked and whatever requested the reload did not retry again.
3194d93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To tell the truth, I'm not so positive for this pull request because of almost same reason with @alex-vmw
(I'm sorry for my late response for that...)
The problem of fluent#3341 is that a worker is killed unexpectedly by
SIGKILL
, so that the issue might be still happen even if we introduce this change.I think we should investigate the root cause of fluent#3341, and I also think that fluent#3380 might fix this issue.
In addition, we should check the consistency of gracefulReload.
When gracefulReload is triggered, a new thread for reloading is created at both supervisor process and worker process.
https://github.com/fluent/fluentd/blob/980425ec8ee967854e91f2aa7371fc9c11a0c640/lib/fluent/supervisor.rb#L279-L296
https://github.com/fluent/fluentd/blob/980425ec8ee967854e91f2aa7371fc9c11a0c640/lib/fluent/supervisor.rb#L918-L940
So that multiple graceReload request might break the consistency.
On the other hand, waiting a previous thread may be enough to keep consistency.
If the consistency is kept, I think excessive requests should be acceptable (as for it, a user should take responsibility).
3194d93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a queue for requests that are executed only after a previous one is finished, then consistency would be kept, right?
Not quite sure what you mean by this, can you elaborate a bit more?