[Merged by Bors] - Skip empty archetypes and tables when iterating over queries #4724

james7132 · 2022-05-11T20:03:52Z

Objective

Speed up queries that are fragmented over many empty archetypes and tables.

Solution

Add a early-out to check if the table or archetype is empty before iterating over it. This adds an extra branch for every archetype matched, but skips setting the archetype/table to the underlying state and any iteration over it.

This may not be worth it for the default Query::iter and maybe even the Query::for_each implementations, but this definitely avoids scheduling unnecessary tasks in the Query::par_for_each case.

Ideally, matched_archetypes should only contain archetypes where there's actually work to do, but this would add a O(n) flat cost to every call to update_archetypes that scales with the number of matched archetypes.

TODO: Benchmark

alice-i-cecile · 2022-05-11T20:07:30Z

Code changes look fine to me. The proof, like always, is in the benchmark :) I'd like to see a new benchmark that looks at iteration performance vs the number of empty archetypes (0, 1k, 5k, 10k maybe?), to see where the break-even point is for this change.

BoxyUwU · 2022-05-11T20:19:46Z

QueryIterationCursor needs to be updated

james7132 · 2022-05-11T20:20:46Z

QueryIterationCursor needs to be updated

Done.

james7132 · 2022-05-12T03:22:51Z

Actual benchmark results here, and the differences here make plenty of sense. Tl;dr: it's really only beneficial to par_for_each and it's variants in the general case, otherwise the total number of empty archetypes aren't really realistic.

group                                       main                                    skip-empty
-----                                       ----                                    ----------
empty_archetypes/for_each/10                1.00      4.2±0.18µs        ? ?/sec     1.47      6.1±0.28µs        ? ?/sec
empty_archetypes/for_each/100               1.00      4.6±0.16µs        ? ?/sec     1.34      6.2±0.97µs        ? ?/sec
empty_archetypes/for_each/1000              1.00      5.6±0.12µs        ? ?/sec     1.19      6.6±0.13µs        ? ?/sec
empty_archetypes/for_each/10000             1.92     20.7±0.47µs        ? ?/sec     1.00     10.8±0.56µs        ? ?/sec
empty_archetypes/for_each/2000              1.18      8.8±0.40µs        ? ?/sec     1.00      7.5±0.55µs        ? ?/sec
empty_archetypes/for_each/500               1.00      5.0±0.19µs        ? ?/sec     1.27      6.3±0.14µs        ? ?/sec
empty_archetypes/for_each/5000              1.59     12.4±0.33µs        ? ?/sec     1.00      7.8±0.75µs        ? ?/sec
empty_archetypes/iter/10                    1.00      4.3±0.21µs        ? ?/sec     1.47      6.4±0.13µs        ? ?/sec
empty_archetypes/iter/100                   1.00      4.4±0.15µs        ? ?/sec     1.44      6.3±0.37µs        ? ?/sec
empty_archetypes/iter/1000                  1.00      5.6±0.18µs        ? ?/sec     1.19      6.6±0.07µs        ? ?/sec
empty_archetypes/iter/10000                 1.74     22.9±0.66µs        ? ?/sec     1.00     13.1±0.83µs        ? ?/sec
empty_archetypes/iter/2000                  1.22      9.1±0.40µs        ? ?/sec     1.00      7.5±0.09µs        ? ?/sec
empty_archetypes/iter/500                   1.00      5.4±0.25µs        ? ?/sec     1.20      6.5±0.13µs        ? ?/sec
empty_archetypes/iter/5000                  1.61     13.4±0.20µs        ? ?/sec     1.00      8.4±0.71µs        ? ?/sec
empty_archetypes/par_for_each/10            1.37      7.7±0.39µs        ? ?/sec     1.00      5.6±0.45µs        ? ?/sec
empty_archetypes/par_for_each/100           1.31      7.7±0.31µs        ? ?/sec     1.00      5.8±0.36µs        ? ?/sec
empty_archetypes/par_for_each/1000          1.29      8.0±0.38µs        ? ?/sec     1.00      6.2±0.71µs        ? ?/sec
empty_archetypes/par_for_each/10000         1.11     15.2±0.47µs        ? ?/sec     1.00     13.7±0.96µs        ? ?/sec
empty_archetypes/par_for_each/2000          1.38      8.5±0.37µs        ? ?/sec     1.00      6.2±0.38µs        ? ?/sec
empty_archetypes/par_for_each/500           1.39      8.4±0.33µs        ? ?/sec     1.00      6.0±0.31µs        ? ?/sec
empty_archetypes/par_for_each/5000          1.25     10.8±0.33µs        ? ?/sec     1.00      8.6±0.48µs        ? ?/sec

alice-i-cecile · 2022-05-12T03:25:34Z

Solid work benchmarking. Those numbers make sense to me. Can we do a more targeted change here then?

james7132 · 2022-05-12T05:03:52Z

There were a few errors with how the prior benchmarks were run. There were 10, 100, 1000, 2000, 5000, and 10000 archetypes made, but not all of them matched the query used in the system. Once all of the created archetypes matched the given query, it seems to show that all of the iteration methods benefit from this in a tangible way. The exception here seems to be with par_for_each, where it seems like the overhead of scheduling the tasks dominates the benchmark, and are otherwise within the margin of error.

group                                  main                                   skip-empty
-----                                  ----                                   ----------
empty_archetypes/for_each/10           1.00      4.6±0.14µs        ? ?/sec    1.05      4.8±0.25µs        ? ?/sec
empty_archetypes/for_each/100          1.00      4.8±0.24µs        ? ?/sec    1.02      4.9±0.20µs        ? ?/sec
empty_archetypes/for_each/1000         1.08      5.7±0.26µs        ? ?/sec    1.00      5.3±0.21µs        ? ?/sec
empty_archetypes/for_each/10000        1.26     15.5±0.16µs        ? ?/sec    1.00     12.3±0.34µs        ? ?/sec
empty_archetypes/for_each/2000         1.20      7.3±0.33µs        ? ?/sec    1.00      6.1±0.27µs        ? ?/sec
empty_archetypes/for_each/500          1.05      5.4±0.20µs        ? ?/sec    1.00      5.1±0.20µs        ? ?/sec
empty_archetypes/for_each/5000         1.40     11.8±0.19µs        ? ?/sec    1.00      8.5±0.26µs        ? ?/sec
empty_archetypes/iter/10               1.08      4.9±0.76µs        ? ?/sec    1.00      4.5±0.18µs        ? ?/sec
empty_archetypes/iter/100              1.18      5.6±0.65µs        ? ?/sec    1.00      4.8±0.33µs        ? ?/sec
empty_archetypes/iter/1000             1.33      7.2±0.30µs        ? ?/sec    1.00      5.5±0.20µs        ? ?/sec
empty_archetypes/iter/10000            1.46     17.7±0.41µs        ? ?/sec    1.00     12.1±0.26µs        ? ?/sec
empty_archetypes/iter/2000             1.35      8.1±0.42µs        ? ?/sec    1.00      6.0±0.29µs        ? ?/sec
empty_archetypes/iter/500              1.26      6.4±0.28µs        ? ?/sec    1.00      5.1±0.22µs        ? ?/sec
empty_archetypes/iter/5000             1.42     11.9±0.15µs        ? ?/sec    1.00      8.4±0.24µs        ? ?/sec
empty_archetypes/par_for_each/10       1.00      8.0±0.38µs        ? ?/sec    1.04      8.3±0.37µs        ? ?/sec
empty_archetypes/par_for_each/100      1.00      8.0±0.32µs        ? ?/sec    1.02      8.2±0.38µs        ? ?/sec
empty_archetypes/par_for_each/1000     1.00      8.3±0.35µs        ? ?/sec    1.11      9.2±0.28µs        ? ?/sec
empty_archetypes/par_for_each/10000    1.00     14.1±0.50µs        ? ?/sec    1.00     14.1±0.47µs        ? ?/sec
empty_archetypes/par_for_each/2000     1.00      8.5±0.37µs        ? ?/sec    1.04      8.9±0.30µs        ? ?/sec

BoxyUwU · 2022-05-12T13:23:40Z

I dont think its right to conclude this has no effect on normal for_each. The benchmark uses Query<&A> which is going to be doing barely any work on empty archetypes. Something like Query<(&A, &A, &A, &A, &A, &A, &A, &A, &A, &A)> might make this show up in benches

james7132 · 2022-05-14T07:45:13Z

I dont think its right to conclude this has no effect on normal for_each. The benchmark uses Query<&A> which is going to be doing barely any work on empty archetypes. Something like Query<(&A, &A, &A, &A, &A, &A, &A, &A, &A, &A)> might make this show up in benches

Tried this and the results are much more pronounced. ~~I also tried filtering during update_archetypes and saw the same overhead at the low levels, but it does scale a lot more easily with higher counts.~~ Nevermind, that's unsound and we might need to find a better way to move the check out of the hot loop.

group                                  main                                   skip-empty
-----                                  ----                                   ----------
empty_archetypes/for_each/10           1.00      4.4±0.14µs        ? ?/sec    1.36      6.0±0.22µs        ? ?/sec
empty_archetypes/for_each/100          1.00      5.0±0.28µs        ? ?/sec    1.38      6.9±0.14µs        ? ?/sec
empty_archetypes/for_each/1000         1.45     10.6±0.16µs        ? ?/sec    1.00      7.3±0.07µs        ? ?/sec
empty_archetypes/for_each/10000        5.21     66.1±1.07µs        ? ?/sec    1.00     12.7±0.62µs        ? ?/sec
empty_archetypes/for_each/2000         2.03     15.4±0.25µs        ? ?/sec    1.00      7.6±0.11µs        ? ?/sec
empty_archetypes/for_each/500          1.21      7.6±0.27µs        ? ?/sec    1.00      6.3±0.17µs        ? ?/sec
empty_archetypes/for_each/5000         3.66     32.5±0.40µs        ? ?/sec    1.00      8.9±0.79µs        ? ?/sec
empty_archetypes/iter/10               1.00      4.8±0.21µs        ? ?/sec    1.32      6.3±0.12µs        ? ?/sec
empty_archetypes/iter/100              1.00      5.0±0.37µs        ? ?/sec    1.24      6.2±0.37µs        ? ?/sec
empty_archetypes/iter/1000             1.56     11.2±0.23µs        ? ?/sec    1.00      7.2±0.10µs        ? ?/sec
empty_archetypes/iter/10000            5.28     66.2±1.52µs        ? ?/sec    1.00     12.5±0.98µs        ? ?/sec
empty_archetypes/iter/2000             2.00     15.0±0.24µs        ? ?/sec    1.00      7.5±0.20µs        ? ?/sec
empty_archetypes/iter/500              1.17      7.7±0.29µs        ? ?/sec    1.00      6.6±0.12µs        ? ?/sec
empty_archetypes/iter/5000             3.87     31.9±0.24µs        ? ?/sec    1.00      8.3±0.41µs        ? ?/sec
empty_archetypes/par_for_each/10       1.36      7.9±0.52µs        ? ?/sec    1.00      5.8±0.86µs        ? ?/sec
empty_archetypes/par_for_each/100      1.31      8.0±0.43µs        ? ?/sec    1.00      6.1±0.33µs        ? ?/sec
empty_archetypes/par_for_each/1000     1.45      8.6±0.43µs        ? ?/sec    1.00      5.9±0.35µs        ? ?/sec
empty_archetypes/par_for_each/10000    1.07     15.2±0.44µs        ? ?/sec    1.00     14.2±0.89µs        ? ?/sec
empty_archetypes/par_for_each/2000     1.46      8.6±0.42µs        ? ?/sec    1.00      5.9±0.32µs        ? ?/sec
empty_archetypes/par_for_each/500      1.36      8.1±0.24µs        ? ?/sec    1.00      6.0±0.62µs        ? ?/sec
empty_archetypes/par_for_each/5000     1.25     11.5±0.32µs        ? ?/sec    1.00      9.1±0.68µs        ? ?/sec

This reverts commit 72884a9.

DJMcNab · 2022-05-14T15:10:13Z

Is it feasible to cull empty archetypes, or re-order archetypes so that the full ones are always at the end (i.e. so we can make the check for emptyness just be 'reached the end of the archetypes list'?

I imagine the challenge there is in ensuring that all the IDs are up-to-date, since they 'need' to be used for indexing - we'd probably need a re-mapping phase? I'm not sure how well it would even out.

james7132 · 2022-05-15T21:19:02Z

Is it feasible to cull empty archetypes, or re-order archetypes so that the full ones are always at the end (i.e. so we can make the check for emptyness just be 'reached the end of the archetypes list'?

This is what the "valid_archetype_ids" naively attempted to do. We'd need to find an opportune time to inject this, since it seems like only new_archetype is called on queries.

bjorn3

The code itself seems to be correct. As for whether we should do this I will leave that to others.

BoxyUwU

code seems correct to me.

I don't really think the benchmarks shown give good reason to make the change though. the benches are completely artificial and skip every single archetype never actually doing anything and even then the perf difference is a regression in a fair amount of cases!

james7132 · 2022-06-16T14:53:49Z

Revisiting this, it may make sense to add this only for the parallel case, where the work per non-empty archetype is definitively non-trivial since it needs to allocate for each task that is spawned and the overhead is significantly heavier when the task pools are under contention.

alice-i-cecile · 2022-09-14T03:14:17Z

Revisiting this, it may make sense to add this only for the parallel case

I think this should be done before merging, which would make this non-controversial IMO.

james7132 · 2022-10-19T07:34:23Z

I think this should be done before merging, which would make this non-controversial IMO.

This was done before this comment. Should be ready to go.

alice-i-cecile · 2022-10-24T13:21:46Z

Description could use updating now, but I'm not going to block on that.

bors r+

# Objective Speed up queries that are fragmented over many empty archetypes and tables. ## Solution Add a early-out to check if the table or archetype is empty before iterating over it. This adds an extra branch for every archetype matched, but skips setting the archetype/table to the underlying state and any iteration over it. This may not be worth it for the default `Query::iter` and maybe even the `Query::for_each` implementations, but this definitely avoids scheduling unnecessary tasks in the `Query::par_for_each` case. Ideally, `matched_archetypes` should only contain archetypes where there's actually work to do, but this would add a `O(n)` flat cost to every call to `update_archetypes` that scales with the number of matched archetypes. TODO: Benchmark

bors · 2022-10-24T13:46:25Z

Pull request successfully merged into main.

Build succeeded:

…ine#4724) # Objective Speed up queries that are fragmented over many empty archetypes and tables. ## Solution Add a early-out to check if the table or archetype is empty before iterating over it. This adds an extra branch for every archetype matched, but skips setting the archetype/table to the underlying state and any iteration over it. This may not be worth it for the default `Query::iter` and maybe even the `Query::for_each` implementations, but this definitely avoids scheduling unnecessary tasks in the `Query::par_for_each` case. Ideally, `matched_archetypes` should only contain archetypes where there's actually work to do, but this would add a `O(n)` flat cost to every call to `update_archetypes` that scales with the number of matched archetypes. TODO: Benchmark

Skip empty archetypes and tables when iterating over queries

02c59eb

alice-i-cecile added A-ECS C-Performance labels May 11, 2022

james7132 requested review from BoxyUwU and TheRawMeatball May 11, 2022 20:05

Update QueryIterationCursor

43b9a65

james7132 added 3 commits May 11, 2022 14:17

Remove unnecessary continues

29da7b9

Fix typo

10d03b9

Add benchmark

fe5859a

Fix up erroneous benchmark

7fe7379

james7132 added 3 commits May 13, 2022 23:28

Use wider queries

4815ff5

Move checks to update_archetypes

72884a9

Merge branch 'main' into skip-empty

b606f53

Revert "Move checks to update_archetypes"

282f70e

This reverts commit 72884a9.

alice-i-cecile added S-Needs-Design and removed S-Needs-Design labels May 18, 2022

alice-i-cecile approved these changes May 18, 2022

View reviewed changes

bjorn3 approved these changes May 18, 2022

View reviewed changes

alice-i-cecile added the S-Ready-For-Final-Review label May 18, 2022

BoxyUwU approved these changes May 26, 2022

View reviewed changes

alice-i-cecile added the X-Controversial label May 26, 2022

james7132 added 4 commits June 29, 2022 18:43

Merge branch 'main' into skip-empty

74cf8a4

Revert for non-parallel cases

37f5b97

Revert iter changes

436e474

Undo extra spaces

f87f12c

alice-i-cecile removed the S-Ready-For-Final-Review label Sep 14, 2022

james7132 force-pushed the skip-empty branch from 0bfc82c to f87f12c Compare October 19, 2022 07:32

Merge branch 'main' into skip-empty

aff39c5

james7132 added S-Ready-For-Final-Review and removed X-Controversial labels Oct 19, 2022

bors bot changed the title ~~Skip empty archetypes and tables when iterating over queries~~ [Merged by Bors] - Skip empty archetypes and tables when iterating over queries Oct 24, 2022

bors bot closed this Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Skip empty archetypes and tables when iterating over queries #4724

[Merged by Bors] - Skip empty archetypes and tables when iterating over queries #4724

james7132 commented May 11, 2022

alice-i-cecile commented May 11, 2022

BoxyUwU commented May 11, 2022

james7132 commented May 11, 2022

james7132 commented May 12, 2022 •

edited

Loading

alice-i-cecile commented May 12, 2022

james7132 commented May 12, 2022

BoxyUwU commented May 12, 2022

james7132 commented May 14, 2022 •

edited

Loading

DJMcNab commented May 14, 2022

james7132 commented May 15, 2022

bjorn3 left a comment

BoxyUwU left a comment

james7132 commented Jun 16, 2022

alice-i-cecile commented Sep 14, 2022

james7132 commented Oct 19, 2022

alice-i-cecile commented Oct 24, 2022

bors bot commented Oct 24, 2022

[Merged by Bors] - Skip empty archetypes and tables when iterating over queries #4724

[Merged by Bors] - Skip empty archetypes and tables when iterating over queries #4724

Conversation

james7132 commented May 11, 2022

Objective

Solution

alice-i-cecile commented May 11, 2022

BoxyUwU commented May 11, 2022

james7132 commented May 11, 2022

james7132 commented May 12, 2022 • edited Loading

alice-i-cecile commented May 12, 2022

james7132 commented May 12, 2022

BoxyUwU commented May 12, 2022

james7132 commented May 14, 2022 • edited Loading

DJMcNab commented May 14, 2022

james7132 commented May 15, 2022

bjorn3 left a comment

Choose a reason for hiding this comment

BoxyUwU left a comment

Choose a reason for hiding this comment

james7132 commented Jun 16, 2022

alice-i-cecile commented Sep 14, 2022

james7132 commented Oct 19, 2022

alice-i-cecile commented Oct 24, 2022

bors bot commented Oct 24, 2022

james7132 commented May 12, 2022 •

edited

Loading

james7132 commented May 14, 2022 •

edited

Loading