-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement throttling for service to storage api #959
Comments
Currently I cannot mount EFS volumes with the rexray docker plugin because of API throttling.
Unfortunately in this state, the plugin is mostly unusable because mounting of volumes is unreliable and leaves the system in a continuous retry state. So, some form of request limit should be implemented. |
@thenoots I also am experiencing this issue. The plugin worked fine for a couple mounts, but as we scaled out across services we quickly hit the rate limit, wreaking havoc across our deployments. |
The scope of this issue is going to change a bit, in light of roadmap plans for CSI support in REX-Ray. End result is more or less the same as the WIP PR that was submitted previous (#1040), but general idea is that REX-Ray will present a mechanism to throttle/rate-limit API calls made by CSI plugins. This can be done in a "global" scope, supporting multiple REX-Ray instances across nodes, and even across multiple Docker, k8s clusters, when they are using the same AWS key, since an AWS rate-limit is tied to that account. |
@cduchesne commented on Sun Feb 05 2017
libStorage should have a throttling mechanism to prevent sending too many requests to storage api
@codenrhoden commented on Wed Mar 15 2017
@koensayr If you want a place to paste that throttling info, this is the place.
@codenrhoden commented on Mon Mar 20 2017
Thoughts from @cantbewong:
Consider this situation:
You need to create, mount, or unmount a volume and the operation takes a long time to complete. The reason could be that an “downstream” 3rd party API to accomplish this is:
Slow - Imposes rate limiting (rejects or fails to successfully execute requests under heavy use) Two forms of rate limiting are known to exist:
Exhibits a cap on requests per unit of time
Exhibits a cap on number of outstanding requests in progress
So, you basically have two options:
This falls into the pattern of a “long running operation” that commonly leads to:
Furthermore, the downstream API, could additional exhibit non-deterministic ordering behavior. For example, when even though a caller submits an unmount request followed by a mount request, the mount request is attempted first.
While it is almost always a mistake to impose specifying an implementation demand in a functional spec, these characteristics leads to:
** Rate limiting (max calls to API per unit time): e.g. do not dispatch external 3rd party requests from this queue at a rate greater than 20 requests per minute
** Outstanding request limiting: e.g. Stop dispatching request if more than 20 active requests are pending final resolution
** Retry Timeout specification: e.g. if any request dispatched from this queue has been pending more than 5 minutes, attempt a retry
** Cancel Timeout specification: e.g. if any request dispatched from this queue has been pending more than 5 minutes, attempt a cancel
** Queue clear invocation for use on communication or authentication error.
*** Certain kinds of errors such as a credential rejection, or host unreachable might be best handled by flushing all pending queued operations rather than re-encountering the error as each queue entry is processed. I mechanism will be provided to allow plugin code to flush the queue.
** Dispatch filter option. A plugin should have the option to look at the next impending dispatched entry from the queue and temporarily suspend dispatch.
*** Some plugins may have a maximum of one pending operation per client (cluster node) or per volume. This feature will allow a plugin to impose appropriate limits on queue draining
** These defaults can be defined by a driver. Override by user config is an optional feature that could be deferred to a later release.
An Example of issue/problem (still open) in Kubernetes support for AWS:
kubernetes/kubernetes#31858
The platform API should be able to respect backend storage platforms rate limits either in a centralized throttling mechanism or something that is deferred to the driver, ex. API overloads in AWS EBS and EFS
The text was updated successfully, but these errors were encountered: