Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.12] Fix handling errors in HTTP service invocation #7327

Merged
merged 1 commit into from
Jan 2, 2024

Conversation

ItalyPaleAle
Copy link
Contributor

Cherry-pick #7324 into release-1.12

I don't think we need release notes as it could be considered a fixup from the other PR

  1. Make sure error is always logged
  2. If data has already been sent to users, log the error only and do not append it to the response, which could cause data corruption for the caller

1. Make sure error is always logged
2. If data has already been sent to users, log the error only and do not append it to the response, which could cause data corruption for the caller

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle ItalyPaleAle requested review from a team as code owners December 20, 2023 21:48
@JoshVanL
Copy link
Contributor

@ItalyPaleAle please can we include tests so we don't have a regression/covers the bug introduced in the previous PR

@ItalyPaleAle ItalyPaleAle changed the title [release-1.12] Fix handling errors in HTTP service invocation (#7324) [release-1.12] Fix handling errors in HTTP service invocation Dec 20, 2023
@ItalyPaleAle
Copy link
Contributor Author

@ItalyPaleAle please can we include tests so we don't have a regression/covers the bug introduced in the previous PR

Not a regression introduced by the other PR, just something else that was left out: adding some logs and making sure errors aren't appended to the body (something observed with 1.12.0 too).

Not sure we can create a test for this as it requires creating complex situations such as gRPC connections failing while a request is in progress. I've been working with @yaron2 and 2 customers this morning and found that this change helped identifying a bug with the gRPC stream between sidecars. We are still investigating the root cause of that bug.

Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (9ec8497) 64.81% compared to head (19ceff9) 64.76%.
Report is 2 commits behind head on release-1.12.

Files Patch % Lines
pkg/http/api_directmessaging.go 86.36% 3 Missing and 3 partials ⚠️
pkg/resiliency/policy.go 66.66% 4 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.12    #7327      +/-   ##
================================================
- Coverage         64.81%   64.76%   -0.06%     
================================================
  Files               231      231              
  Lines             20940    20955      +15     
================================================
- Hits              13573    13571       -2     
- Misses             6234     6251      +17     
  Partials           1133     1133              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JoshVanL
Copy link
Contributor

@ItalyPaleAle please can we include tests so we don't have a regression/covers the bug introduced in the previous PR

Not a regression introduced by the other PR, just something else that was left out: adding some logs and making sure errors aren't appended to the body (something observed with 1.12.0 too).

Not sure we can create a test for this as it requires creating complex situations such as gRPC connections failing while a request is in progress. I've been working with @yaron2 and 2 customers this morning and found that this change helped identifying a bug with the gRPC stream between sidecars. We are still investigating the root cause of that bug.

In the case of code being untestable we should refactor the code to enable introducing mid request errors via a middleware, though it looks like this should be possible with the code today AFAICT.

// Use Warn log here because it's the only way users are notified of the error
log.Warnf("HTTP service invocation failed to complete with error: %v", err)

// Do nothing else, as at least some data was already sent to the client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the user will get a successful response with incomplete data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if the response was successful, the data is already sent to the client within the policy function.

If we are here, it means that the request failed. There are 2 possible options here:

  1. The request failed without ever sending any data to the user. In this case, down below we send a response to the client with a HTTP status code indicating an error, and the error details in the body
  2. The request failed but at least a partial response has already been sent to the client. This is what we've observed when working on Fix timeouts in HTTP service invocation when resiliency policies with timeouts are applied #7270 (which was a bug in Dapr now fixed) and now in bugs reported by 2 separate users (which seem to be due to the gRPC connection between sidecars failing - we are investigating that separately). In this case, we were here trying to re-set the status code (pointlessly since the headers have already been flushed), and more importantly we were appending the error message to the response body, which meant we were sending a "corrupted" response to the caller.

The main change here is recognizing that if some data has already been sent to the client, we should abort the request without sending more data.

At the same time, we were never logging errors. Now errors are logged with debug level if they are also sent to the client in the body; if not, they are logged as warnings.

@yaron2 yaron2 merged commit 8ee6019 into dapr:release-1.12 Jan 2, 2024
18 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants