Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Throttle half_open -> closed attempts #244

Open
michaelkipper opened this issue Jul 2, 2019 · 0 comments
Open

Feature Request: Throttle half_open -> closed attempts #244

michaelkipper opened this issue Jul 2, 2019 · 0 comments

Comments

@michaelkipper
Copy link

What

Currently, when the error_timeout expires, the next acquisition request for a circuit will cause a transition from open to half_open. In this state, workers will attempt to access the resource with a modified timeout of half_open_resource_timeout. The motivation here is that the modified timeout is much lower than the client timeout so if the resource is still unhealthy, it will fail fast(er).

In the current implementation, every available worker (subject to the bulkhead configuration) will attempt the half_open -> closed transition. This means that if the resource is still unhealthy, all the workers could potentially block for half_open_resource_timeout seconds, reducing overall node capacity.

Mathematically, this means that t[half-open] / (t[half-open] + t[error_timeout]) will be spent attempting to re-open the circuit. If t[half-open] is 1.0s and t[error-timeout] is 5.0s (our MySQL defaults) then 16.7% of our capacity will go toward re-opening the circuit. If bulkheads are in place with a quota of 0.5, that number will be 8.3%.

How

When a circuit opens, the number of available tickets should immediately drop to 1. This shields the rest of the workers from this unhealthy resource. This is marginally faster than the open circuit error, since bulkhead acquisition is attempted before circuit-breaker acquisition, but that's likely not a big deal.

When the transition happens from open to half_open, we can raise the number of available tickets to success_threshold, to allow parallel re-closing of the circuit. Once the circuit is finally re-closed, we can raise the number of available tickets back to the original tickets/quota value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant