Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Dm-Chebotarskyi · 2024-03-19T21:09:35Z

Is there an existing issue for this?

I have searched existing issues, it hasn't been reported yet

Use case description

When deploying multiple stacks in parallel with Serverless Framework, we consistently encounter Rate exceeded errors, indicating that we are hitting AWS API rate limits. This issue arises particularly when deploying more than 25 stacks simultaneously, leading to a significant number of failed deployments. These errors not only cause delays but can also lead to incomplete or failed multistep pipelines where steps are dependent on one another.

The current backoff strategy appears to be linear or insufficiently scaled, as evidenced by recurring Rate exceeded logs with relatively consistent sleep times, despite the increasing number of retries. AWS recommends an exponential backoff strategy as a best practice to handle rate limiting in their APIs, as documented here: AWS Knowledge Center: How do I handle CloudFormation's rate exceeded error?.

Proposed solution (optional)

I propose that the Serverless Framework adopt an exponential backoff strategy to mitigate this issue. This strategy would involve dynamically adjusting the delay between retry attempts in an exponential manner, with the option to include jitter to further reduce the likelihood of simultaneous retries causing further throttling. Specifically, the backoff delay should be calculated as follows:

seconds_to_sleep_i = min(b * r^i, MAX_BACKOFF)

Where:

i is the retry count, starting with 1.
b is a random number between 0 and 1.
r is the exponential factor, suggested to be 2.
MAX_BACKOFF is the maximum backoff time, recommended to be 20 seconds as per AWS SDK guidelines.

The text was updated successfully, but these errors were encountered:

Dm-Chebotarskyi · 2024-03-19T21:18:27Z

I have a branch on my fork with implementation and would happily create a PR.

nickmorell · 2024-03-19T21:22:11Z

This would be a huge help for me and my team. We are constantly dealing with Rate exceeded issues resulting in needed manual intervention.

jmanlapid · 2024-03-19T21:32:28Z

Commenting for visibility as my team has to manually redeploy serverless stacks on rate-exceeded pipelines everytime.

ben-exa · 2024-03-28T21:05:15Z

Bump - this would be really helpful for my team

justin8953 · 2024-04-17T06:25:02Z

This would be the solution needed in our team. We occasionally face an exceeding rate, which sometimes causes rollback issues after the update fails. Please take this PR into consideration as soon as possible to make our deployments smoothly. Thanks

RLRabinowitz · 2024-05-08T13:56:20Z

Would love to see this go in as well. These rate limits are very bad to deal with when trying to use serverless at scale

Dm-Chebotarskyi · 2024-05-08T19:07:14Z

Looks like the serverless community is not interested in merging the fix for issue #12400.
For those who face this issue and want to apply a quick fix, here is the plugin that we ended up using (credits to @ben-exa)

const ServerlessError = require('serverless/lib/serverless-error');

class AWSExponentialBackoff {
  constructor(serverless, options) {
    this.serverless = serverless;
    this.options = options;
    this.hooks = {
      initialize: this.enhanceAwsRequest.bind(this),
    };
  }

  enhanceAwsRequest() {
    const awsProvider = this.serverless.getProvider('aws');
    const originalRequest = awsProvider.request.bind(awsProvider);

    awsProvider.request = async (service, method, params, options) => {
      let attempts = 0;
      const MAX_RETRIES = 5;
      const BASE_BACKOFF = 5000; // milliseconds
      const EXPONENTIAL_FACTOR = 2;

      const retryRequest = async () => {
        try {
          return await originalRequest(service, method, params, options);
        } catch (error) {
          const { providerError } = error;
          this.serverless.cli.log(
            `Caught error: ${JSON.stringify(error, null, 2)}`,
          );

          if (
            attempts < MAX_RETRIES &&
            providerError &&
            ((providerError.retryable &&
              providerError.statusCode !== 403 &&
              providerError.code !== 'CredentialsError' &&
              providerError.code !== 'ExpiredTokenException') ||
              providerError.statusCode === 429)
          ) {
            attempts++;
            const backOff =
              BASE_BACKOFF * Math.pow(EXPONENTIAL_FACTOR, attempts - 1);
            this.serverless.cli.log(
              `Error occurred: ${error.message}. Retrying after ${
                backOff / 1000
              } seconds...`,
            );
            await new Promise((resolve) => setTimeout(resolve, backOff));
            return retryRequest();
          }
          throw new ServerlessError(
            `Failed after ${attempts} retries: ${error.message}`,
            error.code,
          );
        }
      };

      return retryRequest();
    };
  }
}

module.exports = AWSExponentialBackoff;

You can just use it in your serverless definition as follows:

plugins:
  # your plugin list
  - ./serverless/plugins/aws-exponential-backoff.js

Dm-Chebotarskyi linked a pull request Mar 19, 2024 that will close this issue

feat: introduce exponential backoff for AWS API to avoid Rate Exceede… Dm-Chebotarskyi/serverless#1

Open

Dm-Chebotarskyi linked a pull request Mar 26, 2024 that will close this issue

feat: introduce exponential backoff for AWS API to avoid Rate Exceede… #12400

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Dm-Chebotarskyi commented Mar 19, 2024

Dm-Chebotarskyi commented Mar 19, 2024 •

edited

nickmorell commented Mar 19, 2024

jmanlapid commented Mar 19, 2024

ben-exa commented Mar 28, 2024

justin8953 commented Apr 17, 2024

RLRabinowitz commented May 8, 2024

Dm-Chebotarskyi commented May 8, 2024

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Comments

Dm-Chebotarskyi commented Mar 19, 2024

Is there an existing issue for this?

Use case description

Proposed solution (optional)

Dm-Chebotarskyi commented Mar 19, 2024 • edited

nickmorell commented Mar 19, 2024

jmanlapid commented Mar 19, 2024

ben-exa commented Mar 28, 2024

justin8953 commented Apr 17, 2024

RLRabinowitz commented May 8, 2024

Dm-Chebotarskyi commented May 8, 2024

Dm-Chebotarskyi commented Mar 19, 2024 •

edited