-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_s3 source: async receive_messages does not return after poll_secs #20017
Comments
Thanks @AvihaiSam . This seems like it may potentially point to an issue in the AWS SDK. The |
Hi @jszwedko. My understanding is that |
I agree with @andjhop |
Good find! I think adding that attempt timeout would make sense to have the client timeout if it receives no response. I agree it could be a generic setting supported by all AWS components. I think it could be configured around here: Lines 198 to 207 in 1b87cce
|
Actually, I see we already have a config option for some components that could maybe be used here: https://vector.dev/docs/reference/configuration/sinks/aws_s3/#request.timeout_secs. We maybe just need to expose that on AWS sources too. |
@jszwedko I will try that, but it states that the default value is 60s, which if was true - i wouldn't have found this issue in the first place... maybe it doesn't apply for any aws-sdk requests... |
well... I get
when I try to add that: sources:
s3:
...
request:
timeout_secs: 15 so I guess it's not really working... |
I don't think @jszwedko was suggesting that this should already work, rather that this configuration could perhaps be made available for AWS sources as it already is for sinks. The problem I see is that the |
Right, yeah, sources don't have the same |
oooh.... I thought I was looking at the source |
That sounds good, although it seems reasonable that a user might want finer grain control over the settings the SDK client already allows for; here we have the ability to configure the connect timeout, read timeout, operation and operation attempt timeouts. Also considering #15518 which already exposed some of these settings for IMDS requests, introducing |
Agreed, some of those settings (operation and operation attempt timeouts) are AWS specific so it could make sense to expose them under AWS-specific options like |
OK, so |
That all sounds right to me! Agreed with having the IMDS fallback to those. |
If you only need the timeout for SQS long polling, then I recommend only setting it for that specific operation in the operation's config override. It's pretty difficult to come up with an operation timeout value that works across different AWS services, since some operations are expected to take longer than others (for example, downloading a large file from S3). |
Thank you, that makes sense. I've made a pull request for an optional timeout configuration on the S3 source for S3 and SQS operations, the exact naming and structure to be decided. |
hey guys, I've compiled the code from @andjhop's PR and run it. right after the the pod starts there's several (5 in this specific case)
logs, and right after that nothing more. i will try to investigate it further... |
Thanks for taking a look at that. Just to confirm, you don't see that issue on |
@jszwedko - I actually see no difference between master and PR. Though I believe the PR is legit and should be working... |
Gotcha, I think I misread your question before. I see that you expect to see the behavior on |
Is that related? I didn't expect for this fix to affect or resolve anything related to the identity cache. |
A note for the community
Problem
when using aws_s3 source, the sqs client's
receive_messages
fn sometimes hangs with no return value.this causes vector to stop polling from sqs (loop still waiting for a future to end before requesting again)
Configuration
Version
0.27.0 -> 0.37.0
Debug Output
No response
Example Data
No response
Additional Context
I have made a fork with a PR that sets a timeout on the
receive_messages
fn, It's probably not the best way to handle it but it's working.It's possible that updating the SDK might help, I wasn't able to update it though.
References
https://discord.com/channels/742820443487993987/1019686524742271038/threads/1214559445963505734
The text was updated successfully, but these errors were encountered: