Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Support register-tight use cases #225

Open
bjacob opened this issue May 11, 2020 · 1 comment
Open

Support register-tight use cases #225

bjacob opened this issue May 11, 2020 · 1 comment

Comments

@bjacob
Copy link

bjacob commented May 11, 2020

This is open-ended. The problem is that many key use cases, such as matrix multiplication kernels, need to know a number of SIMD vector registers that they can count on using. In practice, the number of available architecture registers tends to be just large enough to hit peak performance, so matrix multiplication kernels tend to use all available registers. Here is an example.

In theory, a higher-level language (than raw asm) such as WebAsm abstracts away this fixed number of architecture registers, offering infinitely many variables instead. In practice, register-intensive simd kernels are one area where this abstraction has not been working well. This abstraction is based on spilling registers as necessary, which has only a marginal performance impact on most code, but has often catastrophic impact on register-intensive simd kernels (performance degradations > 2x, sometimes 10x).

This prompts a few question for someone trying to write WebAsm matrix multiplication kernels:

  1. Can the programmer query the number of architecture registers?
  2. Can the programmer make assumptions about the correspondence between the number of SIMD vector variables used in a part of the program, and the register usage of the generated code?

These issues have been severely affecting also C/C++ with intrinsics, and are the main reason why many people prefer to write assembly instead. However, in C/C++ with intrinsics, at least:

  1. One knows the target architecture.
  2. One can "massage" the compiler into generating the expected code. Compilation is AOT and one gets a chance to look at the generated code before shipping.

I'm afraid that these issues, with are bad enough in C/C++ intrinsics to halfway kill this programming model for critical use cases, will affect WebAsm SIMD more severely still due to the abstraction of the client device and browser and the JIT compilation.

@bjacob bjacob changed the title Support use cases that need to target a specific number of registers. Support register-tight use cases May 12, 2020
@tlively
Copy link
Member

tlively commented May 12, 2020

  1. Can the programmer query the number of architecture registers?

No, exposing underlying architectural details would introduce platform-specific behavior and violate WebAssembly's determinism. Although this kind of nondeterminism might be considered for a future proposal, it is out of scope for this SIMD proposal.

  1. Can the programmer make assumptions about the correspondence between the number of SIMD vector variables used in a part of the program, and the register usage of the generated code?

No, different engines may make different register allocation decisions and may optimize or otherwise transform the code however they deem fit, so programmers should not be making these sorts of assumptions. It may be possible to make assumptions about codegen for a particular engine, but it should not be assumed that those assumptions will generalize to other engines.

The low-level, portable SIMD instructions in this proposal have proven to be useful for a wide variety of workloads, but we are aware that there are also many workloads that depend on non-portable instructions. Keep an eye out for future proposals meant to address this problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants