allow annotation to specify shutdown pause #4313

trondhindenes · 2022-03-01T16:35:44Z

In what area(s)?

/area runtime

Describe the feature

We're struggling with making dapr lifecycle robust, especially during shutdown. We have some pods that need a few seconds (up to 30) to shutdown, and during those 30 seconds, they require the dapr sidecar to be available so that messages can be sent. However, since these pods are in the "draining" state, we don't want them to receive more traffic

I propose that a new annotation is added: dapr.io/shutdown-wait-secs. If specified, the sidecar is kept alive for the specified number of seconds, but only for outbound traffic (e.g. it will not pick up new messages etc).
This annotation could be used to setup a preStop hook on the injected container and do something like
command: ["/bin/sh", "-c", "/bin/dapr -c prepShutdown && sleep 30 && /bin/dapr -c doShutdown"]
The point is that there needs to be a way to instruct dapr to go into "passive" mode where it will still forward messages to the queue etc, but not receive anything inbound. Without this it's very difficult to build a robust messaging system using dapr.

The text was updated successfully, but these errors were encountered:

trondhindenes · 2022-03-01T17:02:29Z

I tried to use the dapr service instead of the sidecar address in the meantime, but at least the health endpoint (curl http://<app-id>-dapr:80/v1.0/healthz) just gives a http 400 back :-(

dapr-bot · 2022-04-30T17:06:32Z

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

dapr-bot · 2022-05-07T17:07:02Z

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.

artursouza · 2022-06-06T18:04:34Z

Summarizing the approach:

Stops new incoming messages right away for PubSub, Binding and Service Invocation.
Continue allowing outgoing API calls during grace period for all APIs.
Allow this to be configurable via K8s annotation.

CrazyHZM · 2022-06-09T08:09:45Z

@artursouza
PR #4624 has done something, we still need a k8s annotation.

CrazyHZM · 2022-07-29T07:26:02Z

I found dapr.io/graceful-shutdown-seconds to meet this requirement, can you use dapr.io/graceful-shutdown-seconds to try to solve your problem? @trondhindenes

yaron2 · 2022-11-14T17:06:22Z

cc @trondhindenes

trondhindenes · 2022-11-18T05:57:55Z

sorry, I was completely out of the loop here. Yes I believe it does.
There is a situation which I believel we aren't fully covering:

Imagine a pubsub scenario where a messagehandler processes messages, and for each processed message, it needs to send a message to another queue (confirmation message). Each message might take between 10 seconds and 10 minutes to complete (this is real-life btw, we used dapr to respond to uploaded audio files that would vary a lot in file size and thus processing them also varied a lot).

In this case I would set dapr.io/graceful-shutdown-seconds to the "worst-case setting" of 10 minutes to keep dapr running while the messagehandler finishes its work and is able to send the confirmation message. This means (if I'm understanding things correctly) that dapr will keep the pod up for 10 whether it's needed or not.

This is why I believe it's crucial to have the ability for dapr to somehow ask the "main app" if it's safe to stop. Dapr should exit as soon as the app container has exited - it doesn't make sense to have dapr around after the app container has quit. (This is what I experimented with in DAPS (https://github.com/trondhindenes/daps) and it worked quite well. Would ofc be much better to have that functionality in dapr itself)

yaron2 · 2022-11-18T06:19:59Z

@trondhindenes check out Dapr health checks: https://docs.dapr.io/developing-applications/building-blocks/observability/app-health/.

trondhindenes · 2022-11-18T07:48:09Z

ah, nice. I guess the combination of graceful-shutdown-seconds and health checks will solve this. I'll do some testing! Thanks!

akhilac1 · 2022-11-24T16:54:00Z

@trondhindenes - Please check the draft PR #5562. It is in draft so that I can test thoroughly

artursouza · 2023-03-14T22:50:01Z

@akhilac1 Is this still required after #5562?

trondhindenes · 2023-03-15T07:59:12Z

Hi, sorry for the late reply. I belive this issue is fixed now. I haven't verified it against the described use-case since I don't work on that project any more, but let's close this issue. I'll do some more testing when I have time, and if I discover anything iffy related to this I'll create a new issue (link to this one for history).

trondhindenes · 2023-03-15T07:59:24Z

okay-to-close

JoshVanL · 2023-11-21T12:08:45Z

I don't think graceful-shutdown-seconds covers the use case as described in the feature request as they are subtly but importantly different. Reoppening this issue as there are requests for it to be implemented.

I propose adding a block-shutdown-seconds option (default disabled) that will block the entire shutdown procedure until either 1. the app reports as unhealthy, or 2. the number of seconds has been reached on the block.

husseineAtBay · 2023-12-03T14:01:38Z

Hi @JoshVanL

Following the discussion in 7211, There is an interesting case for pubsub during shutdown which we found that Dapr doesn't permit the subscriber application to finish the last requests sent to the subscriber application, and the requests contexts are cancelled immediately, which the expectation was to enable the last requests to be processed and not to interrupt them (maybe worth defining a dedicated timeout mechanism for these last requests.
Will this be addressed in either graceful-shutdown-seconds or block-shutdown-seconds?

JoshVanL · 2023-12-03T14:42:27Z

Hi @husseineAtBay, indeed the intention is to cover this case (with my proposal) through block-shutdown-seconds to allow the Dapr APIs to remain functional until the seconds limit has been reached or the app reports unhealthy.

husseineAtBay · 2023-12-04T09:01:44Z

Hi @JoshVanL

just to verify I was clear, we are talking about the requests sent from dapr side car to the subscriber app to invoke our application in pub/sub use case, you described the other direction (subscriber app calls to dapr side car during shutdown).

JoshVanL · 2023-12-04T09:28:07Z

@husseineAtBay the other direction should also be covered, we make no distinction at this level- so long as your application is coded to continue to operate after a TERM signal has been received from Kubernetes up until it is ready to exit.

Part of dapr/dapr#4313 Signed-off-by: joshvanl <me@joshvanl.dev>

Closes dapr#4313 Docs: dapr/docs#3893 PR adds the `--block-shutdown-seconds` CLI flag and corresponding `dapr.io/block-shutdown-seconds` Kubernetes annotation which configures Daprd to block the graceful shutdown procedure until _either_, the block shutdown seconds has elapsed _or_ the application has become unhealthy, as according to the normal app health status. By default, this option is unset, and therefore there is no effect to the current behaviour of graceful shutdown. When set, Daprd will block the interrupt signal cascading into runtime until the above requirements have been met. The framework process `Cleanup` order has been reversed to mimic `t.Cleanup` and allow the `logline` process to function correctly. Signed-off-by: joshvanl <me@joshvanl.dev>

* Adds Daprd option `--block-shutdown-seconds` Closes #4313 Docs: dapr/docs#3893 PR adds the `--block-shutdown-seconds` CLI flag and corresponding `dapr.io/block-shutdown-seconds` Kubernetes annotation which configures Daprd to block the graceful shutdown procedure until _either_, the block shutdown seconds has elapsed _or_ the application has become unhealthy, as according to the normal app health status. By default, this option is unset, and therefore there is no effect to the current behaviour of graceful shutdown. When set, Daprd will block the interrupt signal cascading into runtime until the above requirements have been met. The framework process `Cleanup` order has been reversed to mimic `t.Cleanup` and allow the `logline` process to function correctly. Signed-off-by: joshvanl <me@joshvanl.dev> * Revert if check on killing process exec proc cleanup Signed-off-by: joshvanl <me@joshvanl.dev> * Revert error ignore of processes already killed in unix Signed-off-by: joshvanl <me@joshvanl.dev> * Skip shutdown/graceful/block/healthy on windows. * Skip shutdown/block/unhealthy test on windows. * Linting Signed-off-by: joshvanl <me@joshvanl.dev> * Updates `dapr-block-shutdown-seconds` to `dapr-block-shutdown-duration` Signed-off-by: joshvanl <me@joshvanl.dev> --------- Signed-off-by: joshvanl <me@joshvanl.dev> Co-authored-by: Loong Dai <long.dai@intel.com>

* Adds Daprd option `--block-shutdown-seconds` Closes dapr#4313 Docs: dapr/docs#3893 PR adds the `--block-shutdown-seconds` CLI flag and corresponding `dapr.io/block-shutdown-seconds` Kubernetes annotation which configures Daprd to block the graceful shutdown procedure until _either_, the block shutdown seconds has elapsed _or_ the application has become unhealthy, as according to the normal app health status. By default, this option is unset, and therefore there is no effect to the current behaviour of graceful shutdown. When set, Daprd will block the interrupt signal cascading into runtime until the above requirements have been met. The framework process `Cleanup` order has been reversed to mimic `t.Cleanup` and allow the `logline` process to function correctly. Signed-off-by: joshvanl <me@joshvanl.dev> * Revert if check on killing process exec proc cleanup Signed-off-by: joshvanl <me@joshvanl.dev> * Revert error ignore of processes already killed in unix Signed-off-by: joshvanl <me@joshvanl.dev> * Skip shutdown/graceful/block/healthy on windows. * Skip shutdown/block/unhealthy test on windows. * Linting Signed-off-by: joshvanl <me@joshvanl.dev> * Updates `dapr-block-shutdown-seconds` to `dapr-block-shutdown-duration` Signed-off-by: joshvanl <me@joshvanl.dev> --------- Signed-off-by: joshvanl <me@joshvanl.dev> Co-authored-by: Loong Dai <long.dai@intel.com>

trondhindenes added the kind/enhancement label Mar 1, 2022

CrazyHZM mentioned this issue Mar 11, 2022

Stop event processing on sigterm #4317

Closed

dapr-bot added the stale Issues and PRs without response label Apr 30, 2022

dapr-bot closed this as completed May 7, 2022

yaron2 reopened this May 25, 2022

dapr-bot removed the stale Issues and PRs without response label May 25, 2022

artursouza added this to the v1.8 milestone Jun 6, 2022

artursouza added P1 pinned labels Jun 6, 2022

artursouza modified the milestones: v1.8, v1.9 Jun 22, 2022

artursouza modified the milestones: v1.9, v1.10 Sep 29, 2022

akhilac1 mentioned this issue Nov 14, 2022

Scale down in environment with Dapr Side Car does not work as expected - Getting 503 Response #5481

Closed

yaron2 modified the milestones: v1.10, v1.11 Feb 1, 2023

artursouza removed this from the v1.11 milestone Mar 14, 2023

yaron2 closed this as completed Mar 15, 2023

vlardn mentioned this issue Nov 17, 2023

The graceful-shutdown-seconds option is not forcing delay before shutdown #7211

Closed

JoshVanL reopened this Nov 21, 2023

JoshVanL added a commit to JoshVanL/dapr-docs that referenced this issue Dec 4, 2023

Adds Daprd --block-shutdown-seconds reference

3114f4d

Part of dapr/dapr#4313 Signed-off-by: joshvanl <me@joshvanl.dev>

JoshVanL mentioned this issue Dec 4, 2023

Adds Daprd --dapr-block-shutdown-duration reference dapr/docs#3893

Merged

JoshVanL mentioned this issue Dec 5, 2023

Adds Daprd option --dapr-block-shutdown-duration #7268

Merged

yaron2 closed this as completed in #7268 Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow annotation to specify shutdown pause #4313

allow annotation to specify shutdown pause #4313

trondhindenes commented Mar 1, 2022

trondhindenes commented Mar 1, 2022

dapr-bot commented Apr 30, 2022

dapr-bot commented May 7, 2022

artursouza commented Jun 6, 2022

CrazyHZM commented Jun 9, 2022

CrazyHZM commented Jul 29, 2022

yaron2 commented Nov 14, 2022

trondhindenes commented Nov 18, 2022

yaron2 commented Nov 18, 2022

trondhindenes commented Nov 18, 2022

akhilac1 commented Nov 24, 2022

artursouza commented Mar 14, 2023

trondhindenes commented Mar 15, 2023

trondhindenes commented Mar 15, 2023

JoshVanL commented Nov 21, 2023 •

edited

husseineAtBay commented Dec 3, 2023 •

edited

JoshVanL commented Dec 3, 2023

husseineAtBay commented Dec 4, 2023

JoshVanL commented Dec 4, 2023

allow annotation to specify shutdown pause #4313

allow annotation to specify shutdown pause #4313

Comments

trondhindenes commented Mar 1, 2022

In what area(s)?

Describe the feature

trondhindenes commented Mar 1, 2022

dapr-bot commented Apr 30, 2022

dapr-bot commented May 7, 2022

artursouza commented Jun 6, 2022

CrazyHZM commented Jun 9, 2022

CrazyHZM commented Jul 29, 2022

yaron2 commented Nov 14, 2022

trondhindenes commented Nov 18, 2022

yaron2 commented Nov 18, 2022

trondhindenes commented Nov 18, 2022

akhilac1 commented Nov 24, 2022

artursouza commented Mar 14, 2023

trondhindenes commented Mar 15, 2023

trondhindenes commented Mar 15, 2023

JoshVanL commented Nov 21, 2023 • edited

husseineAtBay commented Dec 3, 2023 • edited

JoshVanL commented Dec 3, 2023

husseineAtBay commented Dec 4, 2023

JoshVanL commented Dec 4, 2023

JoshVanL commented Nov 21, 2023 •

edited

husseineAtBay commented Dec 3, 2023 •

edited