Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn_blocking closures non-deterministically fail when runtime is dropping tasks #4834

Closed
ekzhang opened this issue Jul 13, 2022 · 8 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-task Module: tokio/task T-docs Topic: documentation

Comments

@ekzhang
Copy link
Sponsor Contributor

ekzhang commented Jul 13, 2022

Version v1.19.2 (minimal reproduction)

Platform Darwin Kernel Version 21.5.0 ARM64. I'm running on an M1 Pro processor with 10 logical cores.

Description The documentation for task::spawn_blocking says the following:

Closures spawned using spawn_blocking cannot be cancelled. When you shut down the executor, it will wait indefinitely for all blocking operations to finish.

However, in some cases when adding a spawn_blocking call to the Drop implementation of a structure, the blocking call does not execute.

I tried this code:

use std::{thread, time::Duration};
use tokio::task;
use tokio::time;

struct A;

impl Drop for A {
    fn drop(&mut self) {
        println!("Dropping A");
        // thread::sleep(Duration::from_secs(1));
        task::spawn_blocking(|| {
            println!("Inside A blocking");
            thread::sleep(Duration::from_secs(1));
            println!("finished A blocking");
        });
    }
}

#[tokio::main]
async fn main() {
    let a = A;
    tokio::spawn(async {
        time::sleep(Duration::from_secs(1)).await;
        drop(a);
    });
    println!("finished!");
}

When I run this code, it sometimes runs the blocking closure and sometimes does not. This is inconsistent between successive runs, even without recompiling the code. For example, I just ran it 5 times and pasted the terminal output below: In runs number 1, 2, and 5 below, it doesn't run the blocking closure. In runs 3 and 4, it runs the closure. In no cases does the runtime panic or otherwise show any signs of failure.

$  tokidrop git:(main) ✗ cargo run --release
   Compiling tokidrop v0.1.0 (/Users/ezhang/Documents/temp/tokidrop)
    Finished release [optimized] target(s) in 0.40s
     Running `target/release/tokidrop`
finished!
Dropping A
$  tokidrop git:(main) ✗ cargo run --release
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/tokidrop`
finished!
Dropping A
$  tokidrop git:(main) ✗ cargo run --release
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/tokidrop`
finished!
Dropping A
Inside A blocking
finished A blocking
$  tokidrop git:(main) ✗ cargo run --release
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/tokidrop`
finished!
Dropping A
Inside A blocking
finished A blocking
$  tokidrop git:(main) ✗ cargo run --release
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/tokidrop`
finished!
Dropping A

I expected to see this happen: The executor would schedule and wait indefinitely for the spawn_blocking tasks to finish, as described in the quoted documentation section. Or perhaps panic if this usage of spawn_blocking isn't valid, rather than silently fail? It's also a little bit surprising that the behavior is non-deterministic.


For completeness, here are some variations and the behavior I noticed but don't know how to explain, maybe you find it helpful or not:

  • If I uncomment line 10 in the snippet above (// thread::sleep(Duration::from_secs(1));), then the spawn_blocking call happens never, rather than sometimes.
  • If I remove the tokio::spawn call that places A in a runtime task, then A is dropped at the end of the main function instead of by the runtime internally, and the blocking closure always executes to completion, printing Inside A blocking\nfinished A blocking as I would expect.
  • If I run the original code on Rust playground, I've never been able to get the "inside A blocking" line to print. I'm guessing this has to do with my personal computer having more logical CPUs or being faster which causes the non-determinism?

Thank you!

@ekzhang ekzhang added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Jul 13, 2022
@Darksonn Darksonn added T-docs Topic: documentation M-task Module: tokio/task labels Jul 13, 2022
@Darksonn
Copy link
Contributor

This is fixed by the (currently unreleased) PR #4811.

@Darksonn
Copy link
Contributor

(The answer to what's going on is that they can be cancelled if they have not already started running.)

@ekzhang
Copy link
Sponsor Contributor Author

ekzhang commented Jul 13, 2022

Thanks for explaining and sharing the really helpful docs! In that case it seems like putting spawn_blocking() in destructors will not guarantee that the code is run when tasks are dropped on runtime shutdown, so I will need to find another way to do cleanup for my use case.

@Noah-Kennedy
Copy link
Contributor

@ekzhang what exactly do you need to do in the spawn_blocking for your use case?

@cgwalters
Copy link
Contributor

You could just use std::thread::spawn(), no?

@ekzhang
Copy link
Sponsor Contributor Author

ekzhang commented Jul 13, 2022

@Noah-Kennedy The basic example is that I was trying to delete a TempDir from the tempfile crate, and I put that code in a spawn_blocking closure since the Drop destructor for TempDir does a blocking file system operation (removing a directory tree recursively).

This is a spawned task and several function calls deep, a tempdir is used in one of the functions. Since the task can be canceled I would like to make sure that the TempDir actually gets deleted.

Another example in my specific case, besides FS resources, is a file system mount (OverlayFS and FUSE), which needs to be unmounted with the umount2 system call or fusermount, as well as managing cgroup resources. All of these are operations that need to be cleaned up consistently or they will leak memory + OS resources.

@cgwalters I don't think thread::spawn() in a destructor will work for this case because it has the same problem: when main exits, it doesn't wait for all threads to join; they get stopped abruptly.


Thanks a lot for the help though!! I understand that this isn't in scope for the executor and can try to write this part of my system without using Tokio.

@cgwalters
Copy link
Contributor

The basic example is that I was trying to delete a TempDir from the tempfile crate,

I have an opinion on this one: https://internals.rust-lang.org/t/should-rust-programs-unwind-on-sigint/13800/11

@Noah-Kennedy
Copy link
Contributor

Ugh, this is a case which has bitten me before. In a lot of these cases, you can usually get away with performing the calls inline since they are generally fast enough to be effectively non-blocking. That might actually be your best option possibly, combined with block_in_place (although this approach certainly has its problems).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-task Module: tokio/task T-docs Topic: documentation
Projects
None yet
Development

No branches or pull requests

4 participants