Graceful shutdown does not cancel @Scheduled tasks #32109

MelvinFrohike · 2024-01-24T18:25:11Z

Steps to reproduce:

Create a minimal spring-boot 3.2.2 project
Add @EnableScheduling to application
Define a taskScheduler bean of type ThreadPoolTaskScheduler
Create a method annotated with @Scheduled(fixedRate=1000)
Have a long running process in that method
In application.properties set server.shutdown=graceful and spring.lifecycle.timeout-per-shutdown-phase=5s

If I understand the docs correctly, the long running process should be canceled immediately and the task scheduler should be destroyed.

However, when signaling the application to shutdown, the long running process is not aborted immediately. Instead, I get an error message after 5 seconds: Failed to shut down 1 bean with phase value 2147483647 within timeout of 5000ms: [taskScheduler]

If I configure the taskScheduler with

taskScheduler.setWaitForTasksToCompleteOnShutdown(true);
taskScheduler.setAwaitTerminationMillis(0);

the task is canceled immediately.

Minimal example

    @Bean
    public TaskScheduler taskScheduler() {
        var taskScheduler = new ThreadPoolTaskScheduler() {
            @Override
            public void destroy() {
                log.info("taskScheduler Destroy");
                super.destroy();
            }
        };
        taskScheduler.setPoolSize(10);
        taskScheduler.setWaitForTasksToCompleteOnShutdown(false); // this doesn't result in task cancelation.

//        taskScheduler.setWaitForTasksToCompleteOnShutdown(true);
//        taskScheduler.setAwaitTerminationMillis(0);  // this results result in immediate task cancelation
  
      return taskScheduler;
    }

    @Scheduled(fixedRate = 1000)
    public void scheduled() {
        while (true) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }

    @PreDestroy
    void predestroy() {
        log.info("predestroy");
    }

The TaskScheduler's destroy method and the predestroy() method are not called until after the 5 second timeout.
If I configure the taskScheduler with setWaitForTasksToCompleteOnShutdown(true) and taskScheduler.setAwaitTerminationMillis(0), these methods are called immediately.

Is there a misunderstanding on my part, an error in the docs, or a bug?

The text was updated successfully, but these errors were encountered:

jhoeller · 2024-01-25T13:58:59Z

This is a surprisingly nuanced topic given all the input and feedback we had on this over the years.

A key idea behind a graceful shutdown is to let existing tasks complete as far as possible, concurrently in case of multiple executors/schedulers. This was explicitly requested for scheduled tasks (#31019) even before the lifecycle revision in 6.1, and after the lifecycle revision there is dedicated support for such a mode of shutdown now.

From that perspective, the behavior that you are experiencing is by design: For a graceful shutdown, we cancel recurring tasks so that further triggers do not fire anymore but let running tasks complete concurrently within the managed stop phase. If your task takes longer than that to complete, could you try to redesign it for shorter but more frequent triggering possibly? Or otherwise, just set a short enough lifecycle timeout and let it run into that info-level log message (which is not meant to be an error - maybe we should avoid the "failed" term there), followed by an interrupt on remaining tasks for a hard shutdown. No need to set any extra flags for this, you could just rely on the default arrangement there and set a custom lifecycle timeout.

The (old) waitForTasksToCompleteOnShutdown flag changes that behavior, effectively bypassing the concurrent managed stop phase in favor of awaiting a serial shutdown in each executor's destroy method (the common pre-6.1 behavior), potentially taking significant amounts of serial time in case of multiple executors/schedulers (depending on the await-termination setting). Note that this does not actually interrupt running tasks: With a zero-second wait period, it simply lets the JVM end, hard-stopping any remaining threads.

The (new) acceptTasksAfterContextClose flag lets you opt out of the concurrent managed stop phase as well but with a default hard interrupt for remaining tasks on shutdown. So for your desired immediate interrupt-on-shutdown behavior, you should actually set that flag instead of waitForTasksToCompleteOnShutdown. That way you'll get an interrupt on the blocked threads before the JVM shuts down, letting them end in an orderly fashion.

All things considered, I actually recommend the default shutdown behavior with a custom lifecycle timeout, possibly even shorter than 5s. We can revise the wording of that log message if that's the main irritation, e.g. "Shutdown phase 2147483647 ends with 1 bean still running after timeout of 5000ms: [taskScheduler]".

MelvinFrohike · 2024-01-26T08:23:34Z

Thanks for the detailed response.
I See now how to handle my use-case.

Your suggested change of log output is already helpful with clearing up the confusion.
However, I find this to be not enough, as to me, both the API and documentation are confusing.
IMO, it is not intuitive too have these calls result in a hard shutdown:

 taskScheduler.setAcceptTasksAfterContextClose(true);
 taskScheduler.setAwaitTerminationMillis(0);

Neither the names nor the documentation show that these two methods are in any way related. The first call in particular does not seem to have any effect on already running tasks.

In contrast, the setWaitForTasksToCompleteOnShutdown(false) method seems to result in immediate cancelation (implying to me an awaitTerminationMillis setting of 0).

While changing the API to be more intuitive might be tricky due to backwards compatibility, I would suggest clearing up the documentation of these methods.

Thanks again.

MelvinFrohike · 2024-01-26T13:09:40Z

I've spoken too soon about knowing how to handle my use-case.

For a bit more context, I have a project with a graceful shutdown so that active requests are still completed (within the timelimit). I also have a series of tasks running, some of them via @Scheduled. I need to kill only one of these scheduled tasks immediately without waiting for the rest of the application to shutdown gracefully.

I've tried to create two taskSchedulers: one "normal" one and one with these lines:

 taskScheduler.setAcceptTasksAfterContextClose(true);
 taskScheduler.setAwaitTerminationMillis(0);

I use the latter one in the scheduled task that should be canceled immediately.

When shutting down the application and when no other task is running, or request is being completed, the task is canceled immediately, just as I need it to.

However, when either another scheduled task is running (with the "normal" taskScheduler) or a long running request is being completed, my special task is only being canceled when the other task or request is completed or times out.

Thus it seems to me that setAcceptTasksAfterContextClose does not affect cancelation of its tasks when there are other tasks. I've also tried to use taskScheduler.setWaitForTasksToCompleteOnShutdown(true) with the same effect.

How can I get one taskScheduler to cancel its tasks immediately while still retaining the graceful shutdown for other taskSchedulers and endpoints?

I feel this question may no longer be appropriate in an issue and should move to a discussion, but I am not sure the described behavior is intended.

jhoeller · 2024-01-28T17:34:26Z

Thanks for sharing your scenario there, this is useful insight. All of this input is useful for revising our documentation there.

Some of those configuration options have legacy behind them. We try to keep them intact for backwards-compatible behavior in existing applications and also for enforcing pre-6.1 behavior in new setups if necessary. The name of the setting often reflects the original purpose but the overall semantics are not very obvious indeed. Also, please note that those setter methods only affect the local TaskScheduler instance; other TaskScheduler instances operate independently according to their own configuration. If certain tasks go through graceful stopping on one scheduler, that lifecycle step happens before any beans - including other schedulers - reach their destroy step; that's a consequence of the unified lifecycle model.

As for your special task, you could try to specifically react to a ContextClosedEvent in your endpoint implementation. Or we could provide an arrangement for immediately interrupting tasks at ContextClosedEvent time in ThreadPoolTaskScheduler, calling ExecutorService.shutdownNow (the only way to interrupt tasks within an ExecutorService) at that time already. This would happen immediately even next to other TaskSchedulers with graceful shutdown setups then.

…er ThreadPoolTaskScheduler (#2721) (#2738) There has been a significant revision of `ThreadPoolTaskScheduler/Executor` lifecycle capabilities as part of the spring-6.1.x release. It includes a concurrently managed stop phase for `ThreadPoolTaskScheduler/Executor`, favouring early soft shutdown. As a result, the `ThreadPoolTaskScheduler`, which is used for pubsub publishing, now shuts down immediately on `ContextClosedEvent`, thereby rejecting any further task submissions (#2721). This PR aims to retain the default behavior of spring-6.1.x but provides config options to leverage the `lateShutdown` of the underlying `ThreadPoolTaskScheduler`, such as: `spring.cloud.gcp.pubsub.publisher.executor-accept-tasks-after-context-close=true`. References: 1. spring-projects/spring-framework#32109 (comment) 2. spring-projects/spring-framework@b12115b 3. spring-projects/spring-framework@a2000db 4. spring-projects/spring-framework#31019 (comment) 5. https://github.com/spring-projects/spring-framework/blob/996e66abdbaad866f0eab40bcf5628cdea92e046/spring-context/src/main/java/org/springframework/scheduling/concurrent/ExecutorConfigurationSupport.java#L482 Fixes #2721.

spring-projects-issues added the status: waiting-for-triage label Jan 24, 2024

jhoeller self-assigned this Jan 24, 2024

jhoeller added the in: core label Jan 24, 2024

jhoeller added type: documentation and removed status: waiting-for-triage labels Jan 25, 2024

jhoeller added this to the 6.1.4 milestone Jan 25, 2024

jhoeller mentioned this issue Jan 29, 2024

Revisit default lifecycle phases and timeouts (e.g. for ThreadPoolTaskScheduler) #32152

Closed

jhoeller closed this as completed in 08e6df8 Jan 29, 2024

jayakumarc mentioned this issue Mar 26, 2024

Support spring 6.1.x lifecycle changes/late-shutdown for pubsub publisher scheduler GoogleCloudPlatform/spring-cloud-gcp#2738

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown does not cancel @Scheduled tasks #32109

Graceful shutdown does not cancel @Scheduled tasks #32109

MelvinFrohike commented Jan 24, 2024

jhoeller commented Jan 25, 2024

MelvinFrohike commented Jan 26, 2024

MelvinFrohike commented Jan 26, 2024

jhoeller commented Jan 28, 2024

Graceful shutdown does not cancel @Scheduled tasks #32109

Graceful shutdown does not cancel @Scheduled tasks #32109

Comments

MelvinFrohike commented Jan 24, 2024

jhoeller commented Jan 25, 2024

MelvinFrohike commented Jan 26, 2024

MelvinFrohike commented Jan 26, 2024

jhoeller commented Jan 28, 2024