Parallel mode hangs when invoked from Rayon global thread pool #517

Pr0methean · 2023-06-12T01:17:46Z

My app needs to generate and optimize multiple images in parallel, so it uses the Rayon global thread pool. However, the time these images take varies unpredictably, so utilization at the end would be higher if I could also use Oxipng's parallel mode. Unfortunately, Oxipng deadlocks when I invoke oxipng from within a Rayon task in the global thread pool. Here's the app where I'm experiencing this: https://app.circleci.com/pipelines/github/Pr0methean/OcHd-RustBuild?branch=oxipng-parallel

Pr0methean · 2023-06-12T15:42:22Z

I've tried wrapping the optimize_from_memory call in ThreadPool::install with an explicitly-created thread pool, but that leads to starvation of the global pool and out-of-memory errors. A workaround would be if oxipng made the interface of RowFilter public, in which case I could create a separate task for each filter algorithm that should be tried.

andrews05 · 2023-06-12T22:30:51Z

Is this on master or v8? Could you try some older versions perhaps?
Do you know the exact point that it deadlocks?

Pr0methean · 2023-06-12T23:15:45Z

Master.

andrews05 · 2023-06-13T04:15:04Z

Maybe this is relevant? rayon-rs/rayon#592
Oxipng explicitly avoids using mutexes in this way, but perhaps your code does?

Out of interest, are you using the new raw API?

Pr0methean · 2023-06-14T01:32:06Z

That bug is unlikely to be relevant, since the PNG-writing task that invokes Oxipng calls into_inner() on its Mutex before running.

I'm not using the raw API but will look into it.

andrews05 · 2023-06-14T22:31:13Z

As I'm not exactly an expert in Rayon, I'm afraid there's not much else I can do to help, sorry. Perhaps @kornelski may have some ideas?

wait-what · 2023-06-18T19:24:57Z

Very simple to reproduce, using this code as an example. My CPU has 12 threads. This works perfectly well with 11, but as soon as I uncomment 12, none of them finish, there is no CPU activity and the program just hangs. On an 8 thread CPU, it works with 7, but hangs with 8, etc.

use rayon::prelude::*;
use oxipng::{optimize_from_memory, Options, Deflaters};

fn main() {
    let buffer = include_bytes!("sample.png").as_slice();

    let options = Options {
        deflate: Deflaters::Libdeflater { compression: 12 },
        ..Default::default()
    };

    vec![
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
        // 12,
    ].par_iter().for_each(|x| {
        println!("Optimizing {x}");
        optimize_from_memory(&buffer, &options).unwrap();
        println!("Done {x}");
    });
}

andrews05 · 2023-06-19T20:48:00Z

@wait-what Thanks for minimal test case, that's quite helpful!
It seems the problem is with the evaluator - trials are being scheduled in the background and then later it waits for them to finish but there are no threads executing them so it deadlocks.
I'm not really sure what the correct way to resolve this is, but here's one possible solution: andrews05@059e22b
Would you and @Pr0methean mind checking this out to see if it works for you?

wait-what · 2023-06-19T22:44:34Z

Using oxipng = { git = "https://github.com/andrews05/oxipng", branch = "parallel" }. Using the same code, the issue seems to be fixed. Thank you!

Using a while loop like that does seem like it might not be the most optimal solution, but it does seem plenty fast.

kornelski · 2023-06-20T00:24:01Z

I think the yield loop should be safe, because it's not busy-waiting, it's stealing work to run on the current thread.

I'm not sure if it solves the problem 100%, because I don't know if it's guaranteed that rayon will find a job to execute at the moment of entering the while loop. If some timing issue made the evaluation job enqueue a bit later, I suspect the while loop may terminate, and then still get stuck waiting for the channel.

When get_best_candidate is called in a rayon context, it creates a situation where one rayon task is waiting for another rayon task to complete, and that is the equivalent of the mutex problem that rayon can't solve.

I see a couple of solutions to this:

Have an explicit separate rayon threadpool just for the try_image tasks, so that it's always guaranteed there's a thread available for them.
In try_image have an explicit queue for images left to evaluate. Instead of spawning rayon::task for a specific image do this:
1. Add an image to a queue (could be Mutex<Vec<Arc<PngImage> & params>>)
2. Spawn rayon::task that removes an image from the queue and evaluates it
And then in get_best_candidate, try to take images from the queue too, and eval them on the current thread too. This way if rayon::task never runs, it will never take images off the queue, and get_best_candidate will finish the job. It's about the same as the while rayon::yield loop, but only for image tasks, and guaranteed to empty that particular queue.

andrews05 · 2023-06-20T20:34:52Z

Thanks for the analysis, @kornelski!

I'm not sure if it solves the problem 100%, because I don't know if it's guaranteed that rayon will find a job to execute at the moment of entering the while loop. If some timing issue made the evaluation job enqueue a bit later, I suspect the while loop may terminate, and then still get stuck waiting for the channel.

Are you sure this is possible though? I would have thought the job would be guaranteed to be queued before rayon::spawn returns. I mean, my personal preference would be to stick with the simplest solution unless we know for sure that it may not always work 😅

andrews05 · 2023-06-22T20:40:06Z

@Pr0methean Have you had a chance to try out the branch above?

TPS · 2023-06-22T21:34:00Z

Is this work useful toward #169/#275, perchance?

andrews05 · 2023-06-22T21:38:15Z

@TPS Not really, sorry. Although since #169 was opened we've added new filter types, so it will use up to 10 threads now at -o 6.

Pr0methean · 2023-06-22T22:34:22Z

@andrews05 It works; thanks!

andrews05 · 2023-06-22T22:39:44Z

Great, I've opened a PR for it.

andrews05 · 2023-06-25T20:29:20Z

@Pr0methean Was just thinking about the stack overflow you mentioned, are you able to get a stack trace for that?

Pr0methean · 2023-06-25T20:53:32Z

Yes.
stack_trace.txt
The problem occurs because yield_local is effectively recursive. It could probably be solved by having one thread not yield.

Pr0methean · 2023-06-25T21:17:50Z

Opened rayon-rs/rayon#1064.

andrews05 · 2023-06-26T00:10:38Z

Oh I see, so yield_local is picking up a task from your own program (the next image to operate on), rather than an evaluation task, is that right?

Pr0methean · 2023-06-26T01:29:20Z

That's right. I've tried having the image task itself call yield_local() before starting, but this is slower and gives lower CPU utilization than without the Oxipng parallelization. I've also tried wrapping the Oxipng call in install() with a second thread pool, but then I get deadlock again.

andrews05 · 2023-06-26T02:11:02Z

I guess kornelski's solution would have worked better then. Whoops 😅
It is a lot more complex though, so hopefully rayon accepts your patch.

Pr0methean · 2023-06-26T13:06:36Z

Update: it looks like this only affects jobs running in a FIFO scope, but it's unfortunate that I have to choose between my two methods of increasing utilization (parallelizing oxipng vs starting the longest jobs first).

andrews05 · 2023-06-26T20:54:17Z

I believe I have a simple solution. Can you try this out? andrews05@1e46122

Pr0methean · 2023-06-27T03:25:43Z

It may take me a day or two to test this, because I'm also testing zopfli-rs/zopfli#22 and (in preparation for making it a PR) Pr0methean/zopfli@b2d0142.

wait-what mentioned this issue Jun 19, 2023

chore: replace oxipng with crates.io version once it's published mutant-remix/mrxbuilder#1

Closed

andrews05 mentioned this issue Jun 22, 2023

Fix deadlock in parallel mode #524

Merged

TPS mentioned this issue Jun 23, 2023

Never uses more than 6 threads #169

Closed

shssoichiro closed this as completed in #524 Jun 25, 2023

andrews05 mentioned this issue Jun 30, 2023

Stop yielding once all evaluations are started #527

Merged

laurmaedje mentioned this issue Sep 13, 2023

New release #552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel mode hangs when invoked from Rayon global thread pool #517

Parallel mode hangs when invoked from Rayon global thread pool #517

Pr0methean commented Jun 12, 2023

Pr0methean commented Jun 12, 2023

andrews05 commented Jun 12, 2023

Pr0methean commented Jun 12, 2023

andrews05 commented Jun 13, 2023 •

edited

Pr0methean commented Jun 14, 2023

andrews05 commented Jun 14, 2023

wait-what commented Jun 18, 2023 •

edited

andrews05 commented Jun 19, 2023

wait-what commented Jun 19, 2023 •

edited

kornelski commented Jun 20, 2023 •

edited

andrews05 commented Jun 20, 2023

andrews05 commented Jun 22, 2023

TPS commented Jun 22, 2023 •

edited

andrews05 commented Jun 22, 2023

Pr0methean commented Jun 22, 2023

andrews05 commented Jun 22, 2023

andrews05 commented Jun 25, 2023

Pr0methean commented Jun 25, 2023 •

edited

Pr0methean commented Jun 25, 2023

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 26, 2023 •

edited

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 26, 2023

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 27, 2023

Parallel mode hangs when invoked from Rayon global thread pool #517

Parallel mode hangs when invoked from Rayon global thread pool #517

Comments

Pr0methean commented Jun 12, 2023

Pr0methean commented Jun 12, 2023

andrews05 commented Jun 12, 2023

Pr0methean commented Jun 12, 2023

andrews05 commented Jun 13, 2023 • edited

Pr0methean commented Jun 14, 2023

andrews05 commented Jun 14, 2023

wait-what commented Jun 18, 2023 • edited

andrews05 commented Jun 19, 2023

wait-what commented Jun 19, 2023 • edited

kornelski commented Jun 20, 2023 • edited

andrews05 commented Jun 20, 2023

andrews05 commented Jun 22, 2023

TPS commented Jun 22, 2023 • edited

andrews05 commented Jun 22, 2023

Pr0methean commented Jun 22, 2023

andrews05 commented Jun 22, 2023

andrews05 commented Jun 25, 2023

Pr0methean commented Jun 25, 2023 • edited

Pr0methean commented Jun 25, 2023

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 26, 2023 • edited

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 26, 2023

andrews05 commented Jun 26, 2023

Pr0methean commented Jun 27, 2023

andrews05 commented Jun 13, 2023 •

edited

wait-what commented Jun 18, 2023 •

edited

wait-what commented Jun 19, 2023 •

edited

kornelski commented Jun 20, 2023 •

edited

TPS commented Jun 22, 2023 •

edited

Pr0methean commented Jun 25, 2023 •

edited

Pr0methean commented Jun 26, 2023 •

edited