You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
process currently uploads input payloads to S3 and also starts the step function execution using the sfn API. However, we have seen a race condition with this process, as we can't guarantee the step function starts after we claim processing, mainly due to service limits restricting the number of concurrent step function executions.
Instead of having process limited by step functions, we could use the S3 upload to generate an EventBridge event, which can use to start a step function execution. This eliminates the one lengthy step from process and more or less ensures it would not long have time out issues under heavy load.
A disadvantage is that process would not longer have the sfn backpressure, and would move the processing queue from the highly visible SQS queue to the opaque EventBridge. We'd also have to churn through that EventBridge queue before messages start to time out. Unfortunately, it looks like the max age of an EventBridge event is one day, whereas SQS can be configured to queue for up to 14 days (4 days being the default).
In short, this may not be the best idea, but may be worth considering.
The text was updated successfully, but these errors were encountered:
process
currently uploads input payloads to S3 and also starts the step function execution using the sfn API. However, we have seen a race condition with this process, as we can't guarantee the step function starts after we claim processing, mainly due to service limits restricting the number of concurrent step function executions.Instead of having
process
limited by step functions, we could use the S3 upload to generate an EventBridge event, which can use to start a step function execution. This eliminates the one lengthy step fromprocess
and more or less ensures it would not long have time out issues under heavy load.A disadvantage is that
process
would not longer have the sfn backpressure, and would move the processing queue from the highly visible SQS queue to the opaque EventBridge. We'd also have to churn through that EventBridge queue before messages start to time out. Unfortunately, it looks like the max age of an EventBridge event is one day, whereas SQS can be configured to queue for up to 14 days (4 days being the default).In short, this may not be the best idea, but may be worth considering.
The text was updated successfully, but these errors were encountered: