Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rayon as the bevy_task thread pool #11995

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Feb 20, 2024

Objective

This is a draft. There is zero intent to merge this until there is some form of consensus on general direction

This is one potential option for dealing with #6941 and #10064, as well as a general alternative to #11492.

This PR aims to lay the foundations for using rayon as Bevy's core task executor, as well as support the very powerful APIs built on top of it (i.e. ParallelIterator and friends).

Background

Bevy's tasks used to be based on rayon. However, in #1525, we shifted to use async_executor and our own hand-rolled thread pool. The are several reasons for that change:

  • rayon's support of async/await is non-existent, and still effectively is not supported in any first party way.
  • rayon could not run on WASM targets. This has since changed and there is tentative support for mulitthreading with web workers and SharedArrayBuffer on web platforms.

However, losing rayon has actually caused us to lose a significant amount of performance, and the powerful APIs associated with rayon. With rayon being one of the best examples of high data parallelism and ergonomics in the Rust ecosystem, this is potentially leaving a lot on the table.

Rayon has since added support for WASM, and in this PR, I am going to attempt to bridge the gap to allow us to run async tasks in the same thread pool.

Solution

This PR replaces our thread pool with rayon::ThreadPool, and hand create our own async task that schedules using rayon. We then fork futures_lite::block_on to avoid parking the thread, using rayon_core::yield_now to run rayon's tasks whenever the local task yields. This should enable us to get the performance benefits and API support from Rayon, without losing our capability to schedule async tasks onto the same threads.

Not implemented in this PR is to move all non-async tasks (i.e. blocking CPU-bound operations on the ComputeTaskPool) to queue onto rayon instead of async_executor.

Current Unknowns

  • If we yield this way in block_on, will Rayon be able to resume a thread that has been parked by the current implementation of block_on?
  • Do we still see the benefits of rayon's thread management and contention mitigation? It's definitely there, but we may have contention issues in the actual tasks being run.
  • Can we get into the internals of rayon to try to merge these two work stealing queues together? We don't have access to the internals, but we don't need that to use only one top level work stealing queue.
  • Does this work on WASM without too many gotchas?

@james7132 james7132 requested a review from hymm February 20, 2024 09:48
@james7132 james7132 added C-Performance A change motivated by improving speed, memory usage or compile times C-Usability A simple quality-of-life change that makes Bevy easier to use A-Tasks Tools for parallel and async work labels Feb 20, 2024
@hymm
Copy link
Contributor

hymm commented Feb 20, 2024

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

@james7132
Copy link
Member Author

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

Given the slightly cursed things we're doing to support scopes with external executors, I'm not confident that would be 100% sufficient, but it definitely is a cleaner approach overall.

@hymm
Copy link
Contributor

hymm commented Feb 20, 2024

Given the slightly cursed things we're doing to support scopes with external executors, I'm not confident that would be 100% sufficient, but it definitely is a cleaner approach overall.

This can all go away, once we remove non send data from the world.

@james7132 james7132 changed the title Use rayon as our thread pool Use rayon as the bevy_task thread pool Feb 21, 2024
@james7132
Copy link
Member Author

james7132 commented Feb 21, 2024

We should investigate this comment for running async on rayon rayon-rs/rayon#679 (comment) as an alternative to the block_on.

I got this working, and the results are a bit mixed right now.

The Good

Query::par_iter-bound systems are notably faster. Here's a comparison between animate_targets. Yellow is this PR, red is main.

image

You can immediately see the results in the trace, as the threads are spinning up faster than with just async_executor.

image

You can see that the app as a whole is using almost all of the available threads, even for schedules with lots of systems with little work to do in each.

image

The Bad

Every schedule takes longer to run as a whole. Here's the comparison for the Render schedule:
image

This is either stemming from the higher contention from spinning up and turning down so many threads, the alterations to block_on make scope execution harder to terminate than it was before, or the overhead of allocation upon reallocation is affecting tasks like the multithreaded executor.

@hymm
Copy link
Contributor

hymm commented Feb 22, 2024

I did some of my own investigation. One thing I did notice is it seems to take longer for the time system to start running in First after the first run of the multithreaded executor. It wasn't by a ton, maybe 5us. Seems likely this is due to contention. One other note is that it seems very aggressive at starting more threads. Enough so that I'm wondering if spawn_async is misbehaving and starting too many threads. Will investigate more later.

@SkiFire13
Copy link
Contributor

If we yield this way in block_on, will Rayon be able to resume a thread that has been parked by the current implementation of block_on?

AFAIK no, rayon only wakes up threads that registered themselves as sleeping in the threadpool's registry, see for example Sleep::wake_specific_thread where it checks if is_slepping is true, and that's only set in Sleep::sleep, which in turn only gets called when a thread is not running any task (in the case of your block_on, it is running the block_on task, even though that internally parks the thread).

@james7132 when you tested with rayon-rs/rayon#679 (comment) 's suggestion were you still running that block_on function or did you completly switch to async-task?

@james7132
Copy link
Member Author

I was using the fork of block_on that is in this PR. It still uses parking. As far as I can tell, there's no way to yield to rayon that allows it to go to sleep, so our use of block_on may need to use something else. in_place_scope may work.

@@ -611,6 +532,10 @@ pub struct Scope<'scope, 'env: 'scope, T> {
}

impl<'scope, 'env, T: Send + 'scope> Scope<'scope, 'env, T> {
pub fn spawn(&self, f: impl FnOnce() + Send + 'scope) {
self.thread_pool.spawn(|_| f());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right as the scope async executors can end while there are still tasks on the rayon scope. We need to push something into self.spawned. An easy fix here is to use spawn_async.

where
T: Send + 'static,
{
Task::new(self.executor.spawn(future))
let (runnable, task) = async_task::spawn(future, |runnable: Runnable| {
rayon_core::spawn(move || {
Copy link
Contributor

@SkiFire13 SkiFire13 Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that rayon_core::spawn spawns the task on whatever rayon context is the current one in the thread it is called in. Most notably, the async_task closure may be called on a different thread than the one the task was spawned in (e.g. if the Future's Waker was moved into a different thread), in which case this would likely initialize a different global rayon's context, with a different thread pool. It should instead spawn on the current thread pool (with self.thread_pool.spawn(...))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work C-Performance A change motivated by improving speed, memory usage or compile times C-Usability A simple quality-of-life change that makes Bevy easier to use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants