Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.12 service invocation failures with resiliency timeout policies and large responses #7173

Closed
philliphoff opened this issue Nov 10, 2023 · 1 comment · Fixed by #7270
Closed
Assignees
Labels
kind/bug Something isn't working

Comments

@philliphoff
Copy link
Contributor

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

What version of Dapr?

  • 1.12.0
  • edge: v1.11.3-336-g78b7271f-dirty

1.1.x
1.0.x
edge: output of git describe --dirty

Expected Behavior

Invocation of method with 10-100MB response succeeds.

Actual Behavior

Invocation of method fails returning error:

{"errorCode":"ERR_DIRECT_INVOKE","message":"failed to invoke, id: back-end, err: error receiving message: rpc error: code = Canceled desc = context canceled"}

Notes:

  • Invoking the application HTTP endpoint directly succeeds
  • Invoking the method via the "back-end" application's Dapr sidecar HTTP endpoint succeeds
  • Remove the resiliency policy and method invocation via the "front-end" Dapr sidecar will then succeed
  • Invocation succeeds on previous version of Dapr runtime (1.11.x)
  • Timing (i.e. hardware) reportedly makes a difference (reproduces consistently on 16+ vCore CPU, 32+ GB RAM)

Some earlier investigation seems to point to the "front-end" Dapr sidecar canceling the request context while the "back-end" Dapr sidecar is still streaming its response. The original issue was with .NET-based applications using the Dapr .NET SDK, but it also reproduces with a Node.js application not using any SDK.

Steps to Reproduce the Problem

dapr-bug.zip

  1. Expand attached repro application
  2. npm install
  3. dapr run -f ./dapr.yaml
  4. Execute HTTP request POST http://localhost:50050/v1.0/invoke/back-end/method/bugReproduce

This invokes the bugReproduce method of the "back-end" application via the "front-end" Dapr sidecar, and should fail.

Notes:

  • Make the same request on port 55050 (the "back-end" application's Dapr sidecar) and the request should succeed.
  • Make the same request on port 3000 (the "back-end" applications direct endpoint) and the request should succeed.
  • Remove the resiliency policy targeting calls to the "back-end" application and all the requests should succeed.

Release Note

RELEASE NOTE:

@philliphoff philliphoff added the kind/bug Something isn't working label Nov 10, 2023
@olitomlinson
Copy link

This looks in the same ball park as #7145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
3 participants