Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Experimental reconvergence tests #2916

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

alan-baker
Copy link
Contributor

This PR is mainly to make these tests more easily accessible to run.

Experimental reconvergence tests to investigate portability of subgroup operations across platforms

The tests utilize an experimental chromium extension that exposes subgroupBallot(), subgroup_invocation_id, and subgroup_size. They are based on Vulkan CTS tests (some are experimental) and test a variety of reconvergence styles:

  • Workgroup - only tests when the workgroup is expected to be fully converged
  • Subgroup - only tests when the subgroup is expected to be fully converged (equivalent to SPV_KHR_subgroup_uniform_control_flow)
  • Maximal - experimental reconvergence that is meant to match author intuition about convergence
  • WGSLv1 - Close to current spec. Differs from workgroup in that loops are required to converge unless all invocations continue in the same manner

The test infrastructure simulates a program (either pseudo-random or predefined) and compares the results against a GPU run. It runs 128 invocations in a single workgroup (largest possible subgroup). The random program generator is parameterized to avoid excessive runtime or memory usage, but further testing is needed on a variety of devices to properly tune those parameters. Expect a few timeouts. Currently only runs 15 predefined cases and 100 random cases per reconvergence style, but ideally significantly more random tests would be run for more confidence in the results.

Issue: #


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

* Partial framework implemented
* Added more ops
* Moved infrastructure into a utility file
* Refactor executing the program into a helper function
* Add some predefined testcases in a separate group
* Add comments and improve enum names
* Add ballot, store, and return ops
* Very basic result checking
* Switch from var to let in most places
* Add buffer checking
* Fix some bugs:
  * all
  * simulation of IfId
* Add infinite for loops and elects
* remove tabs
* small optimizations to simulation runtime
* add a predefined case with infinite for loop
* Add a variable based for that iterates based on subgroup id
* Add a predefined test case to cover it
* program generation, code generations, and simulation
* Add a predefined program that is run for both workgroup and maxial
  reconvergence
* Fixed a bug in the shader code for testbit
  * couldn't select bits [96,127]
* Fix a bug in ForVar simulation
  * didn't loop correctly
* Refactored some simulation code to reduce duplication
* added infinite loop
* added better safeguards for program length
* optimized simulation runtime
* fixed a couple bugs
* remove the locations buffer
* cap the maximum number of locations per invocation such that the
  buffer size is guaranteed < 256MB
* fix simulation of return in main function
* remove checking of last buffer value in UCF tests
* Add debug functions for dumping info
* remove some logging
* removve some default parameters to ensure consistency
* change some limits to improve runtime performance
* Add loop reduction factors to improve runtime performance
  * ForVar, ForInf, and LoopInf will execute half as many
    when inside 1 loop and a quarter inside 2
* Made all stores unique
  * identical else blocks previously reused values
* refactored test code
  * each reconvergence style is now a separate set of tests
* Added more switch varieties
  * added predefined tests for coverage
* fixed simulation of ForVar
  * reduction was incorrectly handled
* fixed result comparison for ucf cases
* fixed how ucf is calculated
* more documentation
* requires unsafe typecast for experimental feature
* WGSLv1 style is a closer match to current WGSL spec
  * doesn't require loops to converge between iterations
  * added predefined and random test suites to exercise it
* refactored uniform tests for reuse
* improved loop continue ballot generation to avoid false errors
* added a new predefined test that could distinguish between Workgroup
  and WGSLv1 reconvergence
* Add no-op code fragments at low frequency
* Increase number of random testcases
* remove some debug output
* add a control for other debug output
* fix errors from npm run fix
* switch code fragments in comments to drop { } because the linter is
  over aggressive
@alan-baker alan-baker force-pushed the experimental-reconvergence-tests branch from af52f07 to b8f7b88 Compare September 1, 2023 00:30
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Previews, as seen when this build job started (b8f7b88):
Run tests | View tsdoc

@dneto0
Copy link
Contributor

dneto0 commented Sep 1, 2023

I tried the fixed tests on a Pixel 6 Chrome Canary, with the enable-unsafe-apis flag and it couldn't acquire the device.
(I verified that the experimental feature was exposed via webgpureport.org)
pixel6-fixed-subgroup-tests.json.txt

@kainino0x
Copy link
Collaborator

@dneto0 from the logs this may have been due to too many GPU process crashes. You can try restarting the browser and running a single test to see if it works. (I was able to get results on Pixel 6 Pro, though in the Android 14 beta so maybe the driver has some bugfixes.)

* The simulation incorrectly marked some continues as non-uniform
  * this leads to some ballots being marked with the special value
  * in WGSLv1, Workgroup, and Subgroup styles this would lead to fewer
    ballots being checked (so likely no change in results)
  * in Maximal though, this leads to some ballots being a wacky value
    and extra failures
@alan-baker
Copy link
Contributor Author

FYI the latest commit fixes a bug that likely resulted in false negatives for maximal reconvergence tests. It likely has no affect on the other styles of reconvergence.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Previews, as seen when this build job started (d7ea45d):
Run tests | View tsdoc

* Added a new test suite 'uniform_maximal' that tests that ballots all
  work as expected when no divergent branches exist in the code
* The generator has a mode to only generate uniform conditions
  * removes several if, loop, and switch styles
  * restricts types of breaks and continues that generated
  * removes the generation of the election based noise operation
* Adds a predefined test to cover some operations
@github-actions
Copy link

Previews, as seen when this build job started (8072695):
Run tests | View tsdoc

@dneto0
Copy link
Contributor

dneto0 commented Sep 26, 2023

Fro those coming new to this, all the tests here are under webgpu:shader,execution,reconvergence,reconvergence:*

Pre-canned tests useful for playing around are:

  • webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_wgslv1:*
  • webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_workgroup:*
  • webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_subgroup:*
  • webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:predefined_maximal:*

There are other "random" tests that are for stress testing, running hundreds of iterations. replace predefined in the above with random to run them.

  • webgpu:shader,execution,reconvergence,reconvergence:predefined_maximal:uniform_maximal:* also runs hundreds of tests. All branches in the shader are subgroup-uniform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants