WIP: Experimental reconvergence tests #2916

alan-baker · 2023-08-31T13:37:06Z

This PR is mainly to make these tests more easily accessible to run.

Experimental reconvergence tests to investigate portability of subgroup operations across platforms

The tests utilize an experimental chromium extension that exposes subgroupBallot(), subgroup_invocation_id, and subgroup_size. They are based on Vulkan CTS tests (some are experimental) and test a variety of reconvergence styles:

Workgroup - only tests when the workgroup is expected to be fully converged
Subgroup - only tests when the subgroup is expected to be fully converged (equivalent to SPV_KHR_subgroup_uniform_control_flow)
Maximal - experimental reconvergence that is meant to match author intuition about convergence
WGSLv1 - Close to current spec. Differs from workgroup in that loops are required to converge unless all invocations continue in the same manner

The test infrastructure simulates a program (either pseudo-random or predefined) and compares the results against a GPU run. It runs 128 invocations in a single workgroup (largest possible subgroup). The random program generator is parameterized to avoid excessive runtime or memory usage, but further testing is needed on a variety of devices to properly tune those parameters. Expect a few timeouts. Currently only runs 15 predefined cases and 100 random cases per reconvergence style, but ideally significantly more random tests would be run for more confidence in the results.

Issue: #

Requirements for PR author:

All missing test coverage is tracked with "TODO" or .unimplemented().
New helpers are /** documented */ and new helper files are found in helper_index.txt.
Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)

Requirements for reviewer sign-off:

Tests are properly located in the test tree.
Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

* Partial framework implemented

* Added more ops * Moved infrastructure into a utility file

* Refactor executing the program into a helper function * Add some predefined testcases in a separate group * Add comments and improve enum names * Add ballot, store, and return ops

* Very basic result checking

* Switch from var to let in most places * Add buffer checking * Fix some bugs: * all * simulation of IfId

* Add infinite for loops and elects * remove tabs * small optimizations to simulation runtime * add a predefined case with infinite for loop

* Add a variable based for that iterates based on subgroup id * Add a predefined test case to cover it

* program generation, code generations, and simulation * Add a predefined program that is run for both workgroup and maxial reconvergence

* Fixed a bug in the shader code for testbit * couldn't select bits [96,127] * Fix a bug in ForVar simulation * didn't loop correctly * Refactored some simulation code to reduce duplication

* added infinite loop * added better safeguards for program length * optimized simulation runtime * fixed a couple bugs

* remove the locations buffer * cap the maximum number of locations per invocation such that the buffer size is guaranteed < 256MB * fix simulation of return in main function * remove checking of last buffer value in UCF tests

* Add debug functions for dumping info * remove some logging * removve some default parameters to ensure consistency * change some limits to improve runtime performance * Add loop reduction factors to improve runtime performance * ForVar, ForInf, and LoopInf will execute half as many when inside 1 loop and a quarter inside 2 * Made all stores unique * identical else blocks previously reused values

* refactored test code * each reconvergence style is now a separate set of tests * Added more switch varieties * added predefined tests for coverage * fixed simulation of ForVar * reduction was incorrectly handled * fixed result comparison for ucf cases * fixed how ucf is calculated * more documentation

* requires unsafe typecast for experimental feature

* WGSLv1 style is a closer match to current WGSL spec * doesn't require loops to converge between iterations * added predefined and random test suites to exercise it * refactored uniform tests for reuse * improved loop continue ballot generation to avoid false errors * added a new predefined test that could distinguish between Workgroup and WGSLv1 reconvergence

* Add no-op code fragments at low frequency * Increase number of random testcases

* remove some debug output * add a control for other debug output

* fix errors from npm run fix

* switch code fragments in comments to drop { } because the linter is over aggressive

github-actions · 2023-09-01T00:40:29Z

Previews, as seen when this build job started (b8f7b88):
Run tests | View tsdoc

dneto0 · 2023-09-01T21:11:44Z

I tried the fixed tests on a Pixel 6 Chrome Canary, with the enable-unsafe-apis flag and it couldn't acquire the device.
(I verified that the experimental feature was exposed via webgpureport.org)
pixel6-fixed-subgroup-tests.json.txt

kainino0x · 2023-09-02T00:21:27Z

@dneto0 from the logs this may have been due to too many GPU process crashes. You can try restarting the browser and running a single test to see if it works. (I was able to get results on Pixel 6 Pro, though in the Android 14 beta so maybe the driver has some bugfixes.)

* The simulation incorrectly marked some continues as non-uniform * this leads to some ballots being marked with the special value * in WGSLv1, Workgroup, and Subgroup styles this would lead to fewer ballots being checked (so likely no change in results) * in Maximal though, this leads to some ballots being a wacky value and extra failures

alan-baker · 2023-09-08T00:33:40Z

FYI the latest commit fixes a bug that likely resulted in false negatives for maximal reconvergence tests. It likely has no affect on the other styles of reconvergence.

github-actions · 2023-09-08T00:40:35Z

Previews, as seen when this build job started (d7ea45d):
Run tests | View tsdoc

* Added a new test suite 'uniform_maximal' that tests that ballots all work as expected when no divergent branches exist in the code * The generator has a mode to only generate uniform conditions * removes several if, loop, and switch styles * restricts types of breaks and continues that generated * removes the generation of the election based noise operation * Adds a predefined test to cover some operations

github-actions · 2023-09-11T19:47:52Z

Previews, as seen when this build job started (8072695):
Run tests | View tsdoc

dneto0 · 2023-09-26T21:45:23Z

Fro those coming new to this, all the tests here are under webgpu:shader,execution,reconvergence,reconvergence:*

Pre-canned tests useful for playing around are:

webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_wgslv1:*
webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_workgroup:*
webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_subgroup:*
webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_maximal:*

There are other "random" tests that are for stress testing, running hundreds of iterations. replace predefined in the above with random to run them.

webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:uniform_maximal:* also runs hundreds of tests. All branches in the shader are subgroup-uniform.

alan-baker added 30 commits August 2, 2023 15:33

WIP: experimental recovergence tests

cf90baa

* Partial framework implemented

Refactoring and implementation

a25b6dc

* Added more ops * Moved infrastructure into a utility file

More implementation

ad268fe

* Refactor executing the program into a helper function * Add some predefined testcases in a separate group * Add comments and improve enum names * Add ballot, store, and return ops

More implementation

59fd1b1

* Very basic result checking

switch readback style

1136df0

fix sync issue

e6568f7

More implementation

354e66c

* Switch from var to let in most places * Add buffer checking * Fix some bugs: * all * simulation of IfId

Implementation

986e2a8

* Add infinite for loops and elects * remove tabs * small optimizations to simulation runtime * add a predefined case with infinite for loop

Add another for loop variant

6a38736

* Add a variable based for that iterates based on subgroup id * Add a predefined test case to cover it

Add function calls

4cd3e8a

* program generation, code generations, and simulation * Add a predefined program that is run for both workgroup and maxial reconvergence

Add uniform loop

c436200

Fixes and refactoring

e31e16d

* Fixed a bug in the shader code for testbit * couldn't select bits [96,127] * Fix a bug in ForVar simulation * didn't loop correctly * Refactored some simulation code to reduce duplication

Impl and fixes

6b0a807

* added infinite loop * added better safeguards for program length * optimized simulation runtime * fixed a couple bugs

Improve performance and fixes

c97f8e9

* remove the locations buffer * cap the maximum number of locations per invocation such that the buffer size is guaranteed < 256MB * fix simulation of return in main function * remove checking of last buffer value in UCF tests

Add uniform switch statements

2399352

cleanup

9449f3f

Add feature based skips

01cd4c8

* requires unsafe typecast for experimental feature

cleanup

d2a42a6

more docs

d5a691c

fix switch loop count conditional generation

df87cbb

Add noise generation

a190a07

* Add no-op code fragments at low frequency * Increase number of random testcases

Cleanup

7411e0c

* remove some debug output * add a control for other debug output

docs

cede40a

Formatting

aa37e32

* fix errors from npm run fix

Lots of comments to satisfy linting

3367642

missed function

6a80aaa

Comments

b8f7b88

* switch code fragments in comments to drop { } because the linter is over aggressive

alan-baker force-pushed the experimental-reconvergence-tests branch from af52f07 to b8f7b88 Compare September 1, 2023 00:30

dneto0 mentioned this pull request Sep 26, 2023

add subgroups, and make them portable if possible gpuweb/gpuweb#4306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Experimental reconvergence tests #2916

WIP: Experimental reconvergence tests #2916

alan-baker commented Aug 31, 2023

github-actions bot commented Sep 1, 2023

dneto0 commented Sep 1, 2023

kainino0x commented Sep 2, 2023

alan-baker commented Sep 8, 2023

github-actions bot commented Sep 8, 2023

github-actions bot commented Sep 11, 2023

dneto0 commented Sep 26, 2023

WIP: Experimental reconvergence tests #2916

Are you sure you want to change the base?

WIP: Experimental reconvergence tests #2916

Conversation

alan-baker commented Aug 31, 2023

github-actions bot commented Sep 1, 2023

dneto0 commented Sep 1, 2023

kainino0x commented Sep 2, 2023

alan-baker commented Sep 8, 2023

github-actions bot commented Sep 8, 2023

github-actions bot commented Sep 11, 2023

dneto0 commented Sep 26, 2023