Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Nested spawns on scope #4466

Closed
wants to merge 30 commits into from

Conversation

hymm
Copy link
Contributor

@hymm hymm commented Apr 13, 2022

Objective

  • Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor.
  • Fixes Nesting task pool's Scopes #4301

Solution

  • Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools.
  • Steals the lifetime tricks the std::thread::scope uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope.
  • Change the storage for the tasks to a ConcurrentQueue. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. ConcurrentQueue was chosen because it was already in our dependency tree because async_executor depends on it.
  • removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower.

Changelog

Add ability to nest spawns

fn main() {
    let pool = TaskPool::new();
    pool.scope(|scope| {
        scope.spawn(async move {
            // calling scope.spawn from an spawn task was not possible before
            scope.spawn(async move {
                // do something
            });
        });
    })
}

Migration Guide

If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now.

fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {}
// should become
fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {}

scope.spawn_local changed to scope.spawn_on_scope this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures. Spawning of NonSend futures on scope is no longer supported.

TODO

  • think real hard about all the lifetimes
  • add doc about what 'env and 'scope mean.
  • manually check that the single threaded task pool still works
  • Get updated perf numbers
  • check and make sure all the transmutes are necessary
  • move commented out test into a compile fail test
  • look through the tests for scope on std and see if I should add any more tests

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Apr 13, 2022
@alice-i-cecile alice-i-cecile added C-Enhancement A new feature A-ECS Entities, components, systems, and events and removed S-Needs-Triage This issue needs to be labelled labels Apr 13, 2022
@TheRawMeatball TheRawMeatball self-requested a review April 13, 2022 08:24
@hymm
Copy link
Contributor Author

hymm commented Apr 22, 2022

Perf

Perf tests are a bit of a mixed bag currently. Faster on some tests, but slower on others. This plays out with the examples too where many_cubes is about 2% slower and many_lights is about 2% faster.

Click to see perf screenshots

image

image

image

@hymm
Copy link
Contributor Author

hymm commented Apr 22, 2022

Lifetimes

The current PR allows the user to get a &'static Scope. This would be unsound if the user was able to use this reference after Scope gets dropped. I added a compile fail test for having the &Scope reference passed to a 'static thread and tried some other things, but wasn't able to pass the reference to anything other than spawned tasks on the scope. The invariance of the 'scope lifetime seems to keep the reference properly bound to the closure.

Ideally instead of passing an &'env Scope we would pass a &'scope Scope, but the par_for_each implementation on query errors with the &'scope lifetime. This is caused by this bug in rust: rust-lang/rust#95527, and will be fixed when the nll nightly feature gets stabilized. We should change the lifetimes when nll is stabilized. rust-lang/rust#43234

@alice-i-cecile alice-i-cecile added the X-Controversial There is active debate or serious implications around merging this PR label Apr 22, 2022
@maniwani maniwani mentioned this pull request Apr 23, 2022
7 tasks
@hymm
Copy link
Contributor Author

hymm commented Apr 23, 2022

Done messing around with this now barring any random ideas I might get for improving the performance.

@hymm
Copy link
Contributor Author

hymm commented Apr 24, 2022

I did an experiment with starting tasks on demand (after can_start_now check) rather than spawning them in prepare. Leaving a comment here as this was part of the motivation for this PR. Getting about a 4% boost in framerate on many_cubes -- sphere and seeing a decent boost on benches for contrived and empty systems. Probably won't PR this as the executor is getting a full rewrite for stageless. Branch is here: https://github.com/hymm/bevy/tree/start-systems-late

@hymm
Copy link
Contributor Author

hymm commented Apr 27, 2022

Added a span to record the parallel executor scope time. Screenshot is a histogram of the comparison between main and this branch with many cubes example of that span.

Screenshot 2022-04-27 122622

This confirms to me that perf on this branch vs main is workload dependant.

Edit: Note the x axis is logarithmic

@hymm
Copy link
Contributor Author

hymm commented Jun 10, 2022

rebased on current main and reverted much of the changes to the scope executor so it again uses try_tick.

reran the benches and results seem ok. Small improvements in busy systems. Small regression in 01x and 02x, but improvements in 04x. Empty systems seems like mostly noise except there's an improvement with 0 systems. A strange decent improvement to run_criteria/no that I can't explain. A small potential regression in run_criteria/yes, but it's small enough and random enough it could just be noise.

edit: left is before and right is this pr.

Click for benchmarks
busy_systems/01x_entities_03_systems           1.16     29.0±3.59µs        ? ?/sec     1.00     25.1±0.74µs        ? ?/sec
busy_systems/01x_entities_06_systems           1.02     51.7±0.88µs        ? ?/sec     1.00     50.8±2.62µs        ? ?/sec
busy_systems/01x_entities_09_systems           1.07     78.8±2.32µs        ? ?/sec     1.00     73.9±3.24µs        ? ?/sec
busy_systems/01x_entities_12_systems           1.12    104.4±5.16µs        ? ?/sec     1.00     93.4±1.66µs        ? ?/sec
busy_systems/01x_entities_15_systems           1.10    128.5±2.06µs        ? ?/sec     1.00    116.6±2.70µs        ? ?/sec
busy_systems/02x_entities_03_systems           1.00     46.2±0.82µs        ? ?/sec     1.07     49.2±2.30µs        ? ?/sec
busy_systems/02x_entities_06_systems           1.03     91.6±2.70µs        ? ?/sec     1.00     88.7±1.11µs        ? ?/sec
busy_systems/02x_entities_09_systems           1.02    137.0±7.01µs        ? ?/sec     1.00    134.1±1.68µs        ? ?/sec
busy_systems/02x_entities_12_systems           1.00    179.0±2.59µs        ? ?/sec     1.00    179.0±3.92µs        ? ?/sec
busy_systems/02x_entities_15_systems           1.02    223.7±5.08µs        ? ?/sec     1.00    218.8±3.98µs        ? ?/sec
busy_systems/03x_entities_03_systems           1.03     68.7±1.65µs        ? ?/sec     1.00     66.8±1.27µs        ? ?/sec
busy_systems/03x_entities_06_systems           1.00    131.3±3.23µs        ? ?/sec     1.01    133.1±6.21µs        ? ?/sec
busy_systems/03x_entities_09_systems           1.07    204.6±3.52µs        ? ?/sec     1.00    191.9±5.30µs        ? ?/sec
busy_systems/03x_entities_12_systems           1.08   275.0±28.47µs        ? ?/sec     1.00    255.6±2.96µs        ? ?/sec
busy_systems/03x_entities_15_systems           1.03    333.5±5.79µs        ? ?/sec     1.00    324.7±4.37µs        ? ?/sec
busy_systems/04x_entities_03_systems           1.00     86.6±1.55µs        ? ?/sec     1.05     91.0±2.81µs        ? ?/sec
busy_systems/04x_entities_06_systems           1.03    178.3±2.21µs        ? ?/sec     1.00    173.1±2.33µs        ? ?/sec
busy_systems/04x_entities_09_systems           1.04    266.8±3.65µs        ? ?/sec     1.00    255.6±5.44µs        ? ?/sec
busy_systems/04x_entities_12_systems           1.04    351.8±9.94µs        ? ?/sec     1.00    337.9±6.01µs        ? ?/sec
busy_systems/04x_entities_15_systems           1.04    435.5±8.18µs        ? ?/sec     1.00   419.9±10.21µs        ? ?/sec
busy_systems/05x_entities_03_systems           1.00    114.7±2.94µs        ? ?/sec     1.00    114.3±2.95µs        ? ?/sec
busy_systems/05x_entities_06_systems           1.02   221.7±13.35µs        ? ?/sec     1.00    216.8±2.14µs        ? ?/sec
busy_systems/05x_entities_09_systems           1.00    334.3±7.27µs        ? ?/sec     1.00   332.7±12.30µs        ? ?/sec
busy_systems/05x_entities_12_systems           1.00    439.0±8.42µs        ? ?/sec     1.00   438.4±12.05µs        ? ?/sec
busy_systems/05x_entities_15_systems           1.01   547.5±21.01µs        ? ?/sec     1.00   543.2±15.37µs        ? ?/sec
contrived/01x_entities_03_systems              1.00     16.8±0.57µs        ? ?/sec     1.04     17.6±0.24µs        ? ?/sec
contrived/01x_entities_06_systems              1.00     31.3±1.18µs        ? ?/sec     1.02     32.0±0.83µs        ? ?/sec
contrived/01x_entities_09_systems              1.00     45.9±0.91µs        ? ?/sec     1.04     47.7±2.39µs        ? ?/sec
contrived/01x_entities_12_systems              1.00     60.1±1.58µs        ? ?/sec     1.01     60.7±1.33µs        ? ?/sec
contrived/01x_entities_15_systems              1.00     76.2±4.43µs        ? ?/sec     1.02     77.9±1.34µs        ? ?/sec
contrived/02x_entities_03_systems              1.00     24.7±0.81µs        ? ?/sec     1.04     25.8±0.94µs        ? ?/sec
contrived/02x_entities_06_systems              1.00     49.8±0.86µs        ? ?/sec     1.02     50.7±1.05µs        ? ?/sec
contrived/02x_entities_09_systems              1.00     73.9±2.67µs        ? ?/sec     1.02     75.4±1.33µs        ? ?/sec
contrived/02x_entities_12_systems              1.00     97.2±4.56µs        ? ?/sec     1.00     97.2±1.77µs        ? ?/sec
contrived/02x_entities_15_systems              1.00    120.9±4.61µs        ? ?/sec     1.01    121.8±2.62µs        ? ?/sec
contrived/03x_entities_03_systems              1.13     38.4±3.06µs        ? ?/sec     1.00     33.9±2.14µs        ? ?/sec
contrived/03x_entities_06_systems              1.07     71.5±4.11µs        ? ?/sec     1.00     67.0±1.51µs        ? ?/sec
contrived/03x_entities_09_systems              1.06    105.5±7.46µs        ? ?/sec     1.00     99.5±5.04µs        ? ?/sec
contrived/03x_entities_12_systems              1.03   136.3±10.53µs        ? ?/sec     1.00    132.2±8.42µs        ? ?/sec
contrived/03x_entities_15_systems              1.07    171.3±8.64µs        ? ?/sec     1.00    159.7±2.89µs        ? ?/sec
contrived/04x_entities_03_systems              1.02     44.4±2.08µs        ? ?/sec     1.00     43.4±0.74µs        ? ?/sec
contrived/04x_entities_06_systems              1.09    90.9±10.64µs        ? ?/sec     1.00     83.8±2.09µs        ? ?/sec
contrived/04x_entities_09_systems              1.05    133.0±7.42µs        ? ?/sec     1.00   127.0±12.85µs        ? ?/sec
contrived/04x_entities_12_systems              1.09   176.8±14.43µs        ? ?/sec     1.00    162.2±2.95µs        ? ?/sec
contrived/04x_entities_15_systems              1.13   228.1±17.32µs        ? ?/sec     1.00    202.0±3.60µs        ? ?/sec
contrived/05x_entities_03_systems              1.00     49.3±1.28µs        ? ?/sec     1.04     51.4±1.22µs        ? ?/sec
contrived/05x_entities_06_systems              1.01    101.0±1.77µs        ? ?/sec     1.00    100.3±2.13µs        ? ?/sec
contrived/05x_entities_09_systems              1.00    149.0±3.85µs        ? ?/sec     1.01    149.9±4.30µs        ? ?/sec
contrived/05x_entities_12_systems              1.00    199.1±3.10µs        ? ?/sec     1.00    198.8±8.58µs        ? ?/sec
contrived/05x_entities_15_systems              1.00    246.1±3.46µs        ? ?/sec     1.01    247.8±9.67µs        ? ?/sec
empty_systems/000_systems                      1.60  1855.2±348.55ns        ? ?/sec    1.00  1156.3±30.23ns        ? ?/sec
empty_systems/001_systems                      1.00      2.8±0.26µs        ? ?/sec     1.06      3.0±0.13µs        ? ?/sec
empty_systems/002_systems                      1.06      3.5±0.38µs        ? ?/sec     1.00      3.3±0.20µs        ? ?/sec
empty_systems/003_systems                      1.19      4.5±0.67µs        ? ?/sec     1.00      3.8±0.17µs        ? ?/sec
empty_systems/004_systems                      1.00      4.5±0.57µs        ? ?/sec     1.00      4.4±0.12µs        ? ?/sec
empty_systems/005_systems                      1.00      4.9±0.24µs        ? ?/sec     1.07      5.2±0.39µs        ? ?/sec
empty_systems/010_systems                      1.07      8.6±1.19µs        ? ?/sec     1.00      8.1±0.14µs        ? ?/sec
empty_systems/015_systems                      1.00     10.3±0.30µs        ? ?/sec     1.02     10.5±0.23µs        ? ?/sec
empty_systems/020_systems                      1.00     13.2±0.61µs        ? ?/sec     1.01     13.3±0.31µs        ? ?/sec
empty_systems/025_systems                      1.05     16.8±1.72µs        ? ?/sec     1.00     16.0±0.35µs        ? ?/sec
empty_systems/030_systems                      1.06     20.2±1.18µs        ? ?/sec     1.00     19.1±0.80µs        ? ?/sec
empty_systems/035_systems                      1.00     21.6±0.80µs        ? ?/sec     1.01     21.7±0.36µs        ? ?/sec
empty_systems/040_systems                      1.01     25.0±1.75µs        ? ?/sec     1.00     24.6±1.77µs        ? ?/sec
empty_systems/045_systems                      1.00     26.4±1.10µs        ? ?/sec     1.01     26.6±0.79µs        ? ?/sec
empty_systems/050_systems                      1.21     33.9±4.26µs        ? ?/sec     1.00     28.1±0.55µs        ? ?/sec
empty_systems/055_systems                      1.04     34.1±2.82µs        ? ?/sec     1.00     32.8±1.32µs        ? ?/sec
empty_systems/060_systems                      1.00     33.8±2.33µs        ? ?/sec     1.01     34.1±1.17µs        ? ?/sec
empty_systems/065_systems                      1.00     35.4±0.52µs        ? ?/sec     1.05     37.2±2.53µs        ? ?/sec
empty_systems/070_systems                      1.02     37.8±1.13µs        ? ?/sec     1.00     37.2±0.75µs        ? ?/sec
empty_systems/075_systems                      1.00     39.9±0.79µs        ? ?/sec     1.01     40.1±1.14µs        ? ?/sec
empty_systems/080_systems                      1.00     42.6±1.02µs        ? ?/sec     1.04     44.1±2.59µs        ? ?/sec
empty_systems/085_systems                      1.06     47.8±5.16µs        ? ?/sec     1.00     44.9±2.52µs        ? ?/sec
empty_systems/090_systems                      1.03     47.5±2.42µs        ? ?/sec     1.00     46.3±0.92µs        ? ?/sec
empty_systems/095_systems                      1.04     51.3±2.46µs        ? ?/sec     1.00     49.4±3.76µs        ? ?/sec
empty_systems/100_systems                      1.01     53.0±2.94µs        ? ?/sec     1.00     52.3±2.77µs        ? ?/sec
run_criteria/no/001_systems                    2.46      3.1±0.81µs        ? ?/sec     1.00  1253.9±59.80ns        ? ?/sec
run_criteria/no/006_systems                    2.80      3.9±0.81µs        ? ?/sec     1.00  1375.2±37.90ns        ? ?/sec
run_criteria/no/011_systems                    1.88      2.8±1.01µs        ? ?/sec     1.00  1506.6±19.10ns        ? ?/sec
run_criteria/no/016_systems                    1.20  1938.8±491.22ns        ? ?/sec    1.00  1616.1±45.19ns        ? ?/sec
run_criteria/no/021_systems                    1.00  1514.5±158.07ns        ? ?/sec    1.13  1705.9±75.44ns        ? ?/sec
run_criteria/no/026_systems                    1.00  1538.2±163.39ns        ? ?/sec    1.16  1781.9±53.61ns        ? ?/sec
run_criteria/no/031_systems                    1.00  1715.6±186.47ns        ? ?/sec    1.14  1949.4±82.83ns        ? ?/sec
run_criteria/no/036_systems                    1.04      2.2±0.40µs        ? ?/sec     1.00      2.1±0.11µs        ? ?/sec
run_criteria/no/041_systems                    1.22      2.8±0.88µs        ? ?/sec     1.00      2.3±0.12µs        ? ?/sec
run_criteria/no/046_systems                    1.36      3.2±1.18µs        ? ?/sec     1.00      2.4±0.25µs        ? ?/sec
run_criteria/no/051_systems                    1.16      2.8±0.83µs        ? ?/sec     1.00      2.4±0.09µs        ? ?/sec
run_criteria/no/056_systems                    1.27      3.2±1.21µs        ? ?/sec     1.00      2.5±0.09µs        ? ?/sec
run_criteria/no/061_systems                    2.10      5.7±1.82µs        ? ?/sec     1.00      2.7±0.13µs        ? ?/sec
run_criteria/no/066_systems                    2.02      6.1±2.14µs        ? ?/sec     1.00      3.0±0.09µs        ? ?/sec
run_criteria/no/071_systems                    2.17      7.2±2.35µs        ? ?/sec     1.00      3.3±0.17µs        ? ?/sec
run_criteria/no/076_systems                    2.04      7.3±2.91µs        ? ?/sec     1.00      3.6±0.12µs        ? ?/sec
run_criteria/no/081_systems                    3.08     11.6±1.15µs        ? ?/sec     1.00      3.8±0.26µs        ? ?/sec
run_criteria/no/086_systems                    3.42     13.1±0.80µs        ? ?/sec     1.00      3.8±0.16µs        ? ?/sec
run_criteria/no/091_systems                    3.46     13.4±0.59µs        ? ?/sec     1.00      3.9±0.16µs        ? ?/sec
run_criteria/no/096_systems                    3.40     13.5±0.50µs        ? ?/sec     1.00      4.0±0.20µs        ? ?/sec
run_criteria/no/101_systems                    3.01     13.4±0.47µs        ? ?/sec     1.00      4.4±0.16µs        ? ?/sec
run_criteria/no_with_labels/001_systems        2.85      3.5±0.68µs        ? ?/sec     1.00  1225.6±29.09ns        ? ?/sec
run_criteria/no_with_labels/006_systems        2.78      3.8±0.77µs        ? ?/sec     1.00  1370.4±34.81ns        ? ?/sec
run_criteria/no_with_labels/011_systems        2.21      3.2±0.94µs        ? ?/sec     1.00  1451.4±25.65ns        ? ?/sec
run_criteria/no_with_labels/016_systems        1.60      2.5±0.79µs        ? ?/sec     1.00  1540.1±31.09ns        ? ?/sec
run_criteria/no_with_labels/021_systems        1.01  1609.6±276.61ns        ? ?/sec    1.00  1590.7±48.60ns        ? ?/sec
run_criteria/no_with_labels/026_systems        1.02  1726.0±411.80ns        ? ?/sec    1.00  1693.0±38.09ns        ? ?/sec
run_criteria/no_with_labels/031_systems        1.07  1920.8±428.30ns        ? ?/sec    1.00  1795.8±54.07ns        ? ?/sec
run_criteria/no_with_labels/036_systems        1.10      2.1±0.44µs        ? ?/sec     1.00  1881.2±43.87ns        ? ?/sec
run_criteria/no_with_labels/041_systems        1.20      2.4±0.52µs        ? ?/sec     1.00  1997.9±96.45ns        ? ?/sec
run_criteria/no_with_labels/046_systems        1.20      2.5±1.18µs        ? ?/sec     1.00      2.0±0.07µs        ? ?/sec
run_criteria/no_with_labels/051_systems        1.22      2.6±1.30µs        ? ?/sec     1.00      2.2±0.06µs        ? ?/sec
run_criteria/no_with_labels/056_systems        1.48      3.3±1.06µs        ? ?/sec     1.00      2.2±0.05µs        ? ?/sec
run_criteria/no_with_labels/061_systems        1.28      3.0±1.35µs        ? ?/sec     1.00      2.3±0.11µs        ? ?/sec
run_criteria/no_with_labels/066_systems        1.79      4.3±1.49µs        ? ?/sec     1.00      2.4±0.08µs        ? ?/sec
run_criteria/no_with_labels/071_systems        3.39      8.4±2.68µs        ? ?/sec     1.00      2.5±0.09µs        ? ?/sec
run_criteria/no_with_labels/076_systems        2.45      6.2±2.98µs        ? ?/sec     1.00      2.5±0.06µs        ? ?/sec
run_criteria/no_with_labels/081_systems        2.51      6.7±2.59µs        ? ?/sec     1.00      2.7±0.12µs        ? ?/sec
run_criteria/no_with_labels/086_systems        2.31      6.4±2.82µs        ? ?/sec     1.00      2.8±0.14µs        ? ?/sec
run_criteria/no_with_labels/091_systems        2.18      6.3±2.78µs        ? ?/sec     1.00      2.9±0.11µs        ? ?/sec
run_criteria/no_with_labels/096_systems        3.31     10.5±2.66µs        ? ?/sec     1.00      3.2±0.12µs        ? ?/sec
run_criteria/no_with_labels/101_systems        3.36     11.6±2.45µs        ? ?/sec     1.00      3.5±0.22µs        ? ?/sec
run_criteria/yes/001_systems                   2.71      8.0±5.31µs        ? ?/sec     1.00      3.0±0.09µs        ? ?/sec
run_criteria/yes/006_systems                   1.00      5.3±0.13µs        ? ?/sec     1.08      5.7±0.21µs        ? ?/sec
run_criteria/yes/011_systems                   1.00      8.2±0.28µs        ? ?/sec     1.04      8.6±0.19µs        ? ?/sec
run_criteria/yes/016_systems                   1.00     10.5±0.23µs        ? ?/sec     1.07     11.2±0.45µs        ? ?/sec
run_criteria/yes/021_systems                   1.00     13.2±0.24µs        ? ?/sec     1.07     14.1±0.30µs        ? ?/sec
run_criteria/yes/026_systems                   1.00     16.1±0.43µs        ? ?/sec     1.08     17.4±0.92µs        ? ?/sec
run_criteria/yes/031_systems                   1.00     19.0±1.11µs        ? ?/sec     1.07     20.3±0.76µs        ? ?/sec
run_criteria/yes/036_systems                   1.00     21.4±0.60µs        ? ?/sec     1.05     22.4±0.54µs        ? ?/sec
run_criteria/yes/041_systems                   1.00     23.4±0.27µs        ? ?/sec     1.04     24.3±0.35µs        ? ?/sec
run_criteria/yes/046_systems                   1.02     27.1±1.35µs        ? ?/sec     1.00     26.5±0.35µs        ? ?/sec
run_criteria/yes/051_systems                   1.00     28.4±0.69µs        ? ?/sec     1.05     29.8±1.15µs        ? ?/sec
run_criteria/yes/056_systems                   1.00     30.9±0.88µs        ? ?/sec     1.04     32.1±0.60µs        ? ?/sec
run_criteria/yes/061_systems                   1.00     32.8±0.77µs        ? ?/sec     1.00     32.8±0.61µs        ? ?/sec
run_criteria/yes/066_systems                   1.00     35.4±1.65µs        ? ?/sec     1.00     35.5±0.46µs        ? ?/sec
run_criteria/yes/071_systems                   1.00     37.6±1.60µs        ? ?/sec     1.00     37.8±1.07µs        ? ?/sec
run_criteria/yes/076_systems                   1.01     40.5±1.62µs        ? ?/sec     1.00     40.0±0.74µs        ? ?/sec
run_criteria/yes/081_systems                   1.02     43.1±1.63µs        ? ?/sec     1.00     42.2±1.57µs        ? ?/sec
run_criteria/yes/086_systems                   1.01     45.5±1.42µs        ? ?/sec     1.00     45.0±1.88µs        ? ?/sec
run_criteria/yes/091_systems                   1.00     48.2±3.19µs        ? ?/sec     1.01     48.5±2.41µs        ? ?/sec
run_criteria/yes/096_systems                   1.03     50.1±1.44µs        ? ?/sec     1.00     48.5±1.07µs        ? ?/sec
run_criteria/yes/101_systems                   1.03     52.9±2.06µs        ? ?/sec     1.00     51.6±2.38µs        ? ?/sec
run_criteria/yes_using_query/001_systems       1.00      2.4±0.09µs        ? ?/sec     1.22      2.9±0.07µs        ? ?/sec
run_criteria/yes_using_query/006_systems       1.00      5.2±0.09µs        ? ?/sec     1.17      6.1±0.87µs        ? ?/sec
run_criteria/yes_using_query/011_systems       1.00      8.3±0.32µs        ? ?/sec     1.06      8.7±0.21µs        ? ?/sec
run_criteria/yes_using_query/016_systems       1.00     10.9±0.18µs        ? ?/sec     1.07     11.6±0.48µs        ? ?/sec
run_criteria/yes_using_query/021_systems       1.00     13.8±0.35µs        ? ?/sec     1.07     14.7±0.42µs        ? ?/sec
run_criteria/yes_using_query/026_systems       1.00     16.4±0.67µs        ? ?/sec     1.05     17.2±0.69µs        ? ?/sec
run_criteria/yes_using_query/031_systems       1.00     19.1±0.37µs        ? ?/sec     1.04     19.9±0.51µs        ? ?/sec
run_criteria/yes_using_query/036_systems       1.00     22.2±1.57µs        ? ?/sec     1.02     22.6±0.81µs        ? ?/sec
run_criteria/yes_using_query/041_systems       1.00     23.9±0.48µs        ? ?/sec     1.02     24.4±0.50µs        ? ?/sec
run_criteria/yes_using_query/046_systems       1.04     27.2±3.01µs        ? ?/sec     1.00     26.3±0.58µs        ? ?/sec
run_criteria/yes_using_query/051_systems       1.00     28.5±0.69µs        ? ?/sec     1.07     30.5±3.08µs        ? ?/sec
run_criteria/yes_using_query/056_systems       1.00     31.1±1.40µs        ? ?/sec     1.00     31.1±1.12µs        ? ?/sec
run_criteria/yes_using_query/061_systems       1.05     34.7±4.08µs        ? ?/sec     1.00     33.1±1.46µs        ? ?/sec
run_criteria/yes_using_query/066_systems       1.05     36.7±2.76µs        ? ?/sec     1.00     34.9±1.03µs        ? ?/sec
run_criteria/yes_using_query/071_systems       1.00     39.0±2.97µs        ? ?/sec     1.00     39.1±3.85µs        ? ?/sec
run_criteria/yes_using_query/076_systems       1.05     42.2±5.32µs        ? ?/sec     1.00     40.1±1.58µs        ? ?/sec
run_criteria/yes_using_query/081_systems       1.00     41.7±1.15µs        ? ?/sec     1.01     42.1±1.26µs        ? ?/sec
run_criteria/yes_using_query/086_systems       1.12     48.9±5.01µs        ? ?/sec     1.00     43.7±0.99µs        ? ?/sec
run_criteria/yes_using_query/091_systems       1.07     49.8±3.23µs        ? ?/sec     1.00     46.4±1.33µs        ? ?/sec
run_criteria/yes_using_query/096_systems       1.01     51.0±2.94µs        ? ?/sec     1.00     50.4±1.65µs        ? ?/sec
run_criteria/yes_using_query/101_systems       1.02     52.5±1.76µs        ? ?/sec     1.00     51.3±1.07µs        ? ?/sec
run_criteria/yes_using_resource/001_systems    1.00      2.3±0.08µs        ? ?/sec     1.26      3.0±0.12µs        ? ?/sec
run_criteria/yes_using_resource/006_systems    1.00      5.3±0.09µs        ? ?/sec     1.11      5.9±0.20µs        ? ?/sec
run_criteria/yes_using_resource/011_systems    1.00      8.2±0.19µs        ? ?/sec     1.07      8.8±0.23µs        ? ?/sec
run_criteria/yes_using_resource/016_systems    1.00     11.2±0.69µs        ? ?/sec     1.06     11.8±0.23µs        ? ?/sec
run_criteria/yes_using_resource/021_systems    1.00     13.8±0.31µs        ? ?/sec     1.04     14.4±0.40µs        ? ?/sec
run_criteria/yes_using_resource/026_systems    1.00     16.7±0.82µs        ? ?/sec     1.04     17.3±0.47µs        ? ?/sec
run_criteria/yes_using_resource/031_systems    1.00     19.0±0.48µs        ? ?/sec     1.08     20.5±0.95µs        ? ?/sec
run_criteria/yes_using_resource/036_systems    1.00     21.6±0.31µs        ? ?/sec     1.06     22.9±0.78µs        ? ?/sec
run_criteria/yes_using_resource/041_systems    1.00     24.2±0.91µs        ? ?/sec     1.01     24.4±0.31µs        ? ?/sec
run_criteria/yes_using_resource/046_systems    1.00     26.3±0.71µs        ? ?/sec     1.02     26.8±0.47µs        ? ?/sec
run_criteria/yes_using_resource/051_systems    1.00     28.8±0.66µs        ? ?/sec     1.03     29.6±1.10µs        ? ?/sec
run_criteria/yes_using_resource/056_systems    1.00     30.7±0.71µs        ? ?/sec     1.11     34.0±3.01µs        ? ?/sec
run_criteria/yes_using_resource/061_systems    1.00     33.3±1.24µs        ? ?/sec     1.01     33.5±0.92µs        ? ?/sec
run_criteria/yes_using_resource/066_systems    1.02     35.6±1.08µs        ? ?/sec     1.00     35.0±1.24µs        ? ?/sec
run_criteria/yes_using_resource/071_systems    1.02     38.4±1.45µs        ? ?/sec     1.00     37.6±0.86µs        ? ?/sec
run_criteria/yes_using_resource/076_systems    1.02     40.6±1.68µs        ? ?/sec     1.00     39.7±0.90µs        ? ?/sec
run_criteria/yes_using_resource/081_systems    1.00     44.6±2.33µs        ? ?/sec     1.02     45.4±3.69µs        ? ?/sec
run_criteria/yes_using_resource/086_systems    1.08     47.7±3.58µs        ? ?/sec     1.00     44.3±1.19µs        ? ?/sec
run_criteria/yes_using_resource/091_systems    1.05     49.3±4.55µs        ? ?/sec     1.00     47.1±2.05µs        ? ?/sec
run_criteria/yes_using_resource/096_systems    1.10     53.6±4.34µs        ? ?/sec     1.00     48.7±1.64µs        ? ?/sec
run_criteria/yes_using_resource/101_systems    1.00     52.6±2.66µs        ? ?/sec     1.04     54.8±4.50µs        ? ?/sec
run_criteria/yes_with_labels/001_systems       1.00      2.4±0.07µs        ? ?/sec     1.23      3.0±0.09µs        ? ?/sec
run_criteria/yes_with_labels/006_systems       1.00      5.2±0.11µs        ? ?/sec     1.12      5.8±0.27µs        ? ?/sec
run_criteria/yes_with_labels/011_systems       1.00      8.5±0.47µs        ? ?/sec     1.04      8.9±0.63µs        ? ?/sec
run_criteria/yes_with_labels/016_systems       1.00     11.1±0.72µs        ? ?/sec     1.03     11.4±0.55µs        ? ?/sec
run_criteria/yes_with_labels/021_systems       1.05     15.0±1.28µs        ? ?/sec     1.00     14.2±0.53µs        ? ?/sec
run_criteria/yes_with_labels/026_systems       1.00     16.8±1.41µs        ? ?/sec     1.01     16.9±0.41µs        ? ?/sec
run_criteria/yes_with_labels/031_systems       1.00     20.1±1.57µs        ? ?/sec     1.02     20.5±3.04µs        ? ?/sec
run_criteria/yes_with_labels/036_systems       1.00     21.7±0.81µs        ? ?/sec     1.03     22.3±0.43µs        ? ?/sec
run_criteria/yes_with_labels/041_systems       1.00     24.3±1.15µs        ? ?/sec     1.01     24.5±0.80µs        ? ?/sec
run_criteria/yes_with_labels/046_systems       1.00     26.6±1.33µs        ? ?/sec     1.00     26.7±0.73µs        ? ?/sec
run_criteria/yes_with_labels/051_systems       1.00     29.2±1.13µs        ? ?/sec     1.02     29.8±1.13µs        ? ?/sec
run_criteria/yes_with_labels/056_systems       1.00     31.9±1.90µs        ? ?/sec     1.00     32.0±1.17µs        ? ?/sec
run_criteria/yes_with_labels/061_systems       1.00     33.9±1.39µs        ? ?/sec     1.01     34.1±0.71µs        ? ?/sec
run_criteria/yes_with_labels/066_systems       1.00     36.0±1.01µs        ? ?/sec     1.01     36.2±0.90µs        ? ?/sec
run_criteria/yes_with_labels/071_systems       1.00     38.4±1.69µs        ? ?/sec     1.00     38.2±0.98µs        ? ?/sec
run_criteria/yes_with_labels/076_systems       1.00     40.4±1.49µs        ? ?/sec     1.00     40.3±0.67µs        ? ?/sec
run_criteria/yes_with_labels/081_systems       1.00     42.6±1.78µs        ? ?/sec     1.00     42.6±2.03µs        ? ?/sec
run_criteria/yes_with_labels/086_systems       1.03     45.7±1.40µs        ? ?/sec     1.00     44.5±0.98µs        ? ?/sec
run_criteria/yes_with_labels/091_systems       1.02     47.4±1.21µs        ? ?/sec     1.00     46.7±0.80µs        ? ?/sec
run_criteria/yes_with_labels/096_systems       1.02     50.4±1.47µs        ? ?/sec     1.00     49.2±1.68µs        ? ?/sec
run_criteria/yes_with_labels/101_systems       1.03     52.5±1.49µs        ? ?/sec     1.00     51.0±0.80µs        ? ?/sec

stage breakdown seems mostly similar with some speedups and slowdowns in render. This branch does seem to come out on top.

many cubes and 3d_scene stage breakdown
many_cubes main pr 3d_scene main pr
First 458.14us 458.45us 451.8us 450.73
Load Assets 216.15us 218.25us 207.12us 206.4us
Pre Update 123.68us 127.66us 115.1us 114.81us
Update 71.61us 71.84us 39.97us 39.57us
Post Update 2.71ms 2.63ms 298.31us 315.93us
AssetEvents 207.48us 208.69us 195.31us 193.61us
Last 28.63us 18.12us 26.14us 17.09us
Extract 3.48ms 3.4ms 388.25us 358.09us
Prepare 2.65ms 2.64ms 259.3us 255.28us
Queue 731.78us 796.44us 155.36us 155.65us
Sort 735.35us 741.19us 90.93us 90.74us
Render 8.37ms 8.14ms 347.93us 344.77us
Cleanup 46.45us 45.99us 39.88us 39.53us
Frame 20.6ms 20.16ms 2.85ms 2.82ms

@cart
Copy link
Member

cart commented Sep 28, 2022

bors r+

bors bot pushed a commit that referenced this pull request Sep 28, 2022
# Objective

- Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor.
- Fixes #4301

## Solution

- Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools.
- Steals the lifetime tricks the `std::thread::scope` uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope.
- Change the storage for the tasks to a `ConcurrentQueue`. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. `ConcurrentQueue` was chosen because it was already in our dependency tree because `async_executor` depends on it.
- removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower.
---

## Changelog

Add ability to nest spawns

```rust
fn main() {
    let pool = TaskPool::new();
    pool.scope(|scope| {
        scope.spawn(async move {
            // calling scope.spawn from an spawn task was not possible before
            scope.spawn(async move {
                // do something
            });
        });
    })
}
```

## Migration Guide

If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now.

```rust
fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {}
// should become
fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {}
```

`scope.spawn_local` changed to `scope.spawn_on_scope` this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures.

## TODO
* [x] think real hard about all the lifetimes
* [x] add doc about what 'env and 'scope mean.
* [x] manually check that the single threaded task pool still works
* [x] Get updated perf numbers
* [x] check and make sure all the transmutes are necessary
* [x] move commented out test into a compile fail test
* [x] look through the tests for scope on std and see if I should add any more tests

Co-authored-by: Michael Hsu <myhsu@benjaminelectric.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors bors bot changed the title Nested spawns on scope [Merged by Bors] - Nested spawns on scope Sep 28, 2022
@bors bors bot closed this Sep 28, 2022
james7132 pushed a commit to james7132/bevy that referenced this pull request Oct 19, 2022
# Objective

- Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor.
- Fixes bevyengine#4301

## Solution

- Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools.
- Steals the lifetime tricks the `std::thread::scope` uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope.
- Change the storage for the tasks to a `ConcurrentQueue`. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. `ConcurrentQueue` was chosen because it was already in our dependency tree because `async_executor` depends on it.
- removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower.
---

## Changelog

Add ability to nest spawns

```rust
fn main() {
    let pool = TaskPool::new();
    pool.scope(|scope| {
        scope.spawn(async move {
            // calling scope.spawn from an spawn task was not possible before
            scope.spawn(async move {
                // do something
            });
        });
    })
}
```

## Migration Guide

If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now.

```rust
fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {}
// should become
fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {}
```

`scope.spawn_local` changed to `scope.spawn_on_scope` this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures.

## TODO
* [x] think real hard about all the lifetimes
* [x] add doc about what 'env and 'scope mean.
* [x] manually check that the single threaded task pool still works
* [x] Get updated perf numbers
* [x] check and make sure all the transmutes are necessary
* [x] move commented out test into a compile fail test
* [x] look through the tests for scope on std and see if I should add any more tests

Co-authored-by: Michael Hsu <myhsu@benjaminelectric.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
bors bot pushed a commit that referenced this pull request Oct 24, 2022
# Objective

- #4466 broke local tasks running.
- Fixes #6120

## Solution

- Add system for ticking local executors on main thread into bevy_core where the tasks pools are initialized.
- Add ticking local executors into thread executors

## Changelog

- tick all thread local executors in task pool.

## Notes

- ~~Not 100% sure about this PR. Ticking the local executor for the main thread in scope feels a little kludgy as it requires users of bevy_tasks to be calling scope periodically for those tasks to make progress.~~ took this out in favor of a system that ticks the local executors.
@ickk ickk added the C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide label Oct 27, 2022
james7132 pushed a commit to james7132/bevy that referenced this pull request Oct 28, 2022
# Objective

- Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor.
- Fixes bevyengine#4301

## Solution

- Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools.
- Steals the lifetime tricks the `std::thread::scope` uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope.
- Change the storage for the tasks to a `ConcurrentQueue`. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. `ConcurrentQueue` was chosen because it was already in our dependency tree because `async_executor` depends on it.
- removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower.
---

## Changelog

Add ability to nest spawns

```rust
fn main() {
    let pool = TaskPool::new();
    pool.scope(|scope| {
        scope.spawn(async move {
            // calling scope.spawn from an spawn task was not possible before
            scope.spawn(async move {
                // do something
            });
        });
    })
}
```

## Migration Guide

If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now.

```rust
fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {}
// should become
fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {}
```

`scope.spawn_local` changed to `scope.spawn_on_scope` this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures.

## TODO
* [x] think real hard about all the lifetimes
* [x] add doc about what 'env and 'scope mean.
* [x] manually check that the single threaded task pool still works
* [x] Get updated perf numbers
* [x] check and make sure all the transmutes are necessary
* [x] move commented out test into a compile fail test
* [x] look through the tests for scope on std and see if I should add any more tests

Co-authored-by: Michael Hsu <myhsu@benjaminelectric.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
james7132 pushed a commit to james7132/bevy that referenced this pull request Oct 28, 2022
# Objective

- bevyengine#4466 broke local tasks running.
- Fixes bevyengine#6120

## Solution

- Add system for ticking local executors on main thread into bevy_core where the tasks pools are initialized.
- Add ticking local executors into thread executors

## Changelog

- tick all thread local executors in task pool.

## Notes

- ~~Not 100% sure about this PR. Ticking the local executor for the main thread in scope feels a little kludgy as it requires users of bevy_tasks to be calling scope periodically for those tasks to make progress.~~ took this out in favor of a system that ticks the local executors.
Pietrek14 pushed a commit to Pietrek14/bevy that referenced this pull request Dec 17, 2022
# Objective

- bevyengine#4466 broke local tasks running.
- Fixes bevyengine#6120

## Solution

- Add system for ticking local executors on main thread into bevy_core where the tasks pools are initialized.
- Add ticking local executors into thread executors

## Changelog

- tick all thread local executors in task pool.

## Notes

- ~~Not 100% sure about this PR. Ticking the local executor for the main thread in scope feels a little kludgy as it requires users of bevy_tasks to be calling scope periodically for those tasks to make progress.~~ took this out in favor of a system that ticks the local executors.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor.
- Fixes bevyengine#4301

## Solution

- Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools.
- Steals the lifetime tricks the `std::thread::scope` uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope.
- Change the storage for the tasks to a `ConcurrentQueue`. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. `ConcurrentQueue` was chosen because it was already in our dependency tree because `async_executor` depends on it.
- removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower.
---

## Changelog

Add ability to nest spawns

```rust
fn main() {
    let pool = TaskPool::new();
    pool.scope(|scope| {
        scope.spawn(async move {
            // calling scope.spawn from an spawn task was not possible before
            scope.spawn(async move {
                // do something
            });
        });
    })
}
```

## Migration Guide

If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now.

```rust
fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {}
// should become
fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {}
```

`scope.spawn_local` changed to `scope.spawn_on_scope` this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures.

## TODO
* [x] think real hard about all the lifetimes
* [x] add doc about what 'env and 'scope mean.
* [x] manually check that the single threaded task pool still works
* [x] Get updated perf numbers
* [x] check and make sure all the transmutes are necessary
* [x] move commented out test into a compile fail test
* [x] look through the tests for scope on std and see if I should add any more tests

Co-authored-by: Michael Hsu <myhsu@benjaminelectric.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- bevyengine#4466 broke local tasks running.
- Fixes bevyengine#6120

## Solution

- Add system for ticking local executors on main thread into bevy_core where the tasks pools are initialized.
- Add ticking local executors into thread executors

## Changelog

- tick all thread local executors in task pool.

## Notes

- ~~Not 100% sure about this PR. Ticking the local executor for the main thread in scope feels a little kludgy as it requires users of bevy_tasks to be calling scope periodically for those tasks to make progress.~~ took this out in favor of a system that ticks the local executors.
@hymm hymm deleted the nested-scopes branch October 5, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide C-Enhancement A new feature X-Controversial There is active debate or serious implications around merging this PR
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

Nesting task pool's Scopes
6 participants