Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object storage connection failures failing to record in S3 debugging details #19584

Closed
bbodenmiller opened this issue Feb 17, 2022 · 5 comments
Closed
Labels
Bug Report/Open Bug report/issue

Comments

@bbodenmiller
Copy link

bbodenmiller commented Feb 17, 2022

Summary

We're seeing intermittent object storage failures in Mattermost log but not seeing the details in S3 debug log despite having it enabled. We are seeing S3 debug logs for successfully connections to S3.

Steps to reproduce

  1. Connect Mattermost server to object storage
  2. Enable S3 debugging
  3. Have connection failures with object storage
  4. Observe failures don't show in S3 debug log

Expected behavior

S3 debug log should show failures to help in determining what is going wrong

Observed behavior (that appears unintentional)

S3 debug log does not show failures

In Mattermost log:

{"timestamp":"2022-02-16 21:46:49.882 -08:00","level":"error","msg":"Unable to write the file.","caller":"web/context.go:105","path":"/api/v4/users/h4u8h1nd73yyirrnxewhs5bfgy/image","request_id":"9ipy164jbfrrzywc1r3iy43wmr","ip_addr":"x.x.x.x","user_id":"dpgtfiw6ttbt3qzkp89pfewbco","method":"GET","err_where":"WriteFile","http_code":500,"err_details":"unable write the data in the file users/h4u8h1nd73yyirrnxewhs5bfgy/profile.png: Get \"https://x.com/mattermost/?location=\": read tcp y.y.y.y:25961->z.z.z.z:443: read: connection reset by peer"}

S3 log: no entries for h4u8h1nd73yyirrnxewhs5bfgy in 2022-02-17_05:46 period (S3 log appears to use UTC time)

Furthermore if we look at all HTTP status codes in S3 log we only see 200 and 404s for 24 hour period (as would be expected if object storage is working correctly).

Possible fixes

Questions/observations

  • Is object storage failing and S3 failing to record that?
  • Is Mattermost failing to talk to S3 SDK and never even trying to get resource from object storage?
    • This issue started at random and does not appear to lineup with any changes we made to Mattermost server or any Mattermost application upgrades.
@amyblais
Copy link
Member

Opened a ticket: https://mattermost.atlassian.net/browse/MM-42006

@agnivade
Copy link
Member

@bbodenmiller - Are you not observing the S3 debug logs at all, or sometimes they log and sometimes not? Can you show me the logs around this line:

 {"timestamp":"2022-02-16 21:46:49.882 -08:00","level":"error","msg":"Unable to write the file.","caller":"web/context.go:105","path":"/api/v4/users/h4u8h1nd73yyirrnxewhs5bfgy/image","request_id":"9ipy164jbfrrzywc1r3iy43wmr","ip_addr":"x.x.x.x","user_id":"dpgtfiw6ttbt3qzkp89pfewbco","method":"GET","err_where":"WriteFile","http_code":500,"err_details":"unable write the data in the file users/h4u8h1nd73yyirrnxewhs5bfgy/profile.png: Get \"https://x.com/mattermost/?location=\": read tcp y.y.y.y:25961->z.z.z.z:443: read: connection reset by peer"}

?

This issue started at random and does not appear to lineup with any changes we made to Mattermost server or any Mattermost application upgrades.

Would this point to an infrastructure issue?

@bbodenmiller
Copy link
Author

@agnivade

Are you not observing the S3 debug logs at all, or sometimes they log and sometimes not?

I get S3 SDK logs but only for successful connections to S3 like 200 and 404. Let me see if I can pull some examples tomorrow.

Would this point to an infrastructure issue?

You would think but the infrastructure people tell me everything is working as expected at network and S3 level. Probably best to discuss further in #20427 since that's the issue I was working when determined I'm not seeing S3 logs as expected.

@bbodenmiller
Copy link
Author

@agnivade an example of my S3 debug log is shown at #19579. Since the S3 debug log overlaps it is hard to follow but the short story is entries in Mattermost log with connection reset by peer are not showing at all in S3 debug log. I would have thought that S3 debug log would show requests that were attempted but failed due to connection failure. e.g. h4u8h1nd73yyirrnxewhs5bfgy/image shown above does not appear in S3 debug log during the same time period.

Today during a 17 second period a large number of connection reset by peer errors were thrown in Mattermost log. In the S3 debug log during same 17 second period there is only 200 OK entries and none of the requested files shown with connection reset by peer in Mattermost log are showing in the S3 debug log.

Here is one of the S3 debug log entries from today during that 17 second time period:

2022-06-10_23:09:24.99994 ---------START-HTTP---------
2022-06-10_23:09:25.00007 GET /mattermost/emoji/quhaadeout8mmnhz4m4g8ggrge/image HTTP/1.1
2022-06-10_23:09:25.00008 Host: mys3.company.com
2022-06-10_23:09:25.00009 User-Agent: MinIO (linux; amd64) minio-go/v7.0.16
2022-06-10_23:09:25.00009 Authorization: AWS4-HMAC-SHA256 Credential=xxx/20220610/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED**
2022-06-10_23:09:25.00009 X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2022-06-10_23:09:25.00009 X-Amz-Date: 20220610T230924Z
2022-06-10_23:09:25.00009 Accept-Encoding: gzip
2022-06-10_23:09:25.00009 
2022-06-10_23:09:25.00010 HTTP/1.1 200 OK
2022-06-10_23:09:25.00010 Content-Length: 858945
2022-06-10_23:09:25.00010 Content-Type: application/octet-stream
2022-06-10_23:09:25.00011 Date: Fri, 10 Jun 2022 23:09:24 GMT
2022-06-10_23:09:25.00011 Etag: "6b3cf60670555f33964fcbb4b69ed0cb"
2022-06-10_23:09:25.00011 Last-Modified: Sun, 21 Feb 2021 00:15:13 GMT
2022-06-10_23:09:25.00011 Server: ViPR/1.0
2022-06-10_23:09:25.00011 X-Amz-Id-2: f97320651c0f4ce7d20dcc371284c074f4407a406f17700271d2bc645729587e
2022-06-10_23:09:25.00012 X-Amz-Request-Id: 8988089d:17dfd0d98f2:11a76a:e
2022-06-10_23:09:25.00012 X-Amz-Server-Side-Encryption: AES256
2022-06-10_23:09:25.00012 X-Amz-Version-Id: 1613866513502
2022-06-10_23:09:25.00012 X-Emc-Mtime: 1613866513502
2022-06-10_23:09:25.00012 ---------END-HTTP---------

@agnivade
Copy link
Member

This seems to be an issue with the minio library. If there's an error while making the request, the request does not get logged: https://github.com/minio/minio-go/blob/ff482a18933aa30769bfaeb7d34fa680fa51bcee/api.go#L491

I have raised an issue here: minio/minio-go#1664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Report/Open Bug report/issue
Projects
None yet
Development

No branches or pull requests

3 participants