Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits #12396

Open
1 task done
Dm-Chebotarskyi opened this issue Mar 19, 2024 · 7 comments · May be fixed by Dm-Chebotarskyi/serverless#1 or #12400
Open
1 task done

Comments

@Dm-Chebotarskyi
Copy link

Is there an existing issue for this?

  • I have searched existing issues, it hasn't been reported yet

Use case description

When deploying multiple stacks in parallel with Serverless Framework, we consistently encounter Rate exceeded errors, indicating that we are hitting AWS API rate limits. This issue arises particularly when deploying more than 25 stacks simultaneously, leading to a significant number of failed deployments. These errors not only cause delays but can also lead to incomplete or failed multistep pipelines where steps are dependent on one another.

The current backoff strategy appears to be linear or insufficiently scaled, as evidenced by recurring Rate exceeded logs with relatively consistent sleep times, despite the increasing number of retries. AWS recommends an exponential backoff strategy as a best practice to handle rate limiting in their APIs, as documented here: AWS Knowledge Center: How do I handle CloudFormation's rate exceeded error?.

Proposed solution (optional)

I propose that the Serverless Framework adopt an exponential backoff strategy to mitigate this issue. This strategy would involve dynamically adjusting the delay between retry attempts in an exponential manner, with the option to include jitter to further reduce the likelihood of simultaneous retries causing further throttling. Specifically, the backoff delay should be calculated as follows:

seconds_to_sleep_i = min(b * r^i, MAX_BACKOFF)

Where:

  • i is the retry count, starting with 1.
  • b is a random number between 0 and 1.
  • r is the exponential factor, suggested to be 2.
  • MAX_BACKOFF is the maximum backoff time, recommended to be 20 seconds as per AWS SDK guidelines.
@Dm-Chebotarskyi
Copy link
Author

Dm-Chebotarskyi commented Mar 19, 2024

I have a branch on my fork with implementation and would happily create a PR.

@nickmorell
Copy link

This would be a huge help for me and my team. We are constantly dealing with Rate exceeded issues resulting in needed manual intervention.

@jmanlapid
Copy link

Commenting for visibility as my team has to manually redeploy serverless stacks on rate-exceeded pipelines everytime.

@ben-exa
Copy link

ben-exa commented Mar 28, 2024

Bump - this would be really helpful for my team

@justin8953
Copy link

This would be the solution needed in our team. We occasionally face an exceeding rate, which sometimes causes rollback issues after the update fails. Please take this PR into consideration as soon as possible to make our deployments smoothly. Thanks

@RLRabinowitz
Copy link

Would love to see this go in as well. These rate limits are very bad to deal with when trying to use serverless at scale

@Dm-Chebotarskyi
Copy link
Author

Looks like the serverless community is not interested in merging the fix for issue #12400.
For those who face this issue and want to apply a quick fix, here is the plugin that we ended up using (credits to @ben-exa)

const ServerlessError = require('serverless/lib/serverless-error');

class AWSExponentialBackoff {
  constructor(serverless, options) {
    this.serverless = serverless;
    this.options = options;
    this.hooks = {
      initialize: this.enhanceAwsRequest.bind(this),
    };
  }

  enhanceAwsRequest() {
    const awsProvider = this.serverless.getProvider('aws');
    const originalRequest = awsProvider.request.bind(awsProvider);

    awsProvider.request = async (service, method, params, options) => {
      let attempts = 0;
      const MAX_RETRIES = 5;
      const BASE_BACKOFF = 5000; // milliseconds
      const EXPONENTIAL_FACTOR = 2;

      const retryRequest = async () => {
        try {
          return await originalRequest(service, method, params, options);
        } catch (error) {
          const { providerError } = error;
          this.serverless.cli.log(
            `Caught error: ${JSON.stringify(error, null, 2)}`,
          );

          if (
            attempts < MAX_RETRIES &&
            providerError &&
            ((providerError.retryable &&
              providerError.statusCode !== 403 &&
              providerError.code !== 'CredentialsError' &&
              providerError.code !== 'ExpiredTokenException') ||
              providerError.statusCode === 429)
          ) {
            attempts++;
            const backOff =
              BASE_BACKOFF * Math.pow(EXPONENTIAL_FACTOR, attempts - 1);
            this.serverless.cli.log(
              `Error occurred: ${error.message}. Retrying after ${
                backOff / 1000
              } seconds...`,
            );
            await new Promise((resolve) => setTimeout(resolve, backOff));
            return retryRequest();
          }
          throw new ServerlessError(
            `Failed after ${attempts} retries: ${error.message}`,
            error.code,
          );
        }
      };

      return retryRequest();
    };
  }
}

module.exports = AWSExponentialBackoff;

You can just use it in your serverless definition as follows:

plugins:
  # your plugin list
  - ./serverless/plugins/aws-exponential-backoff.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants