Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Support multiplication of a vector against one lane (broadcasted) of another vector #227

Open
bjacob opened this issue May 11, 2020 · 0 comments

Comments

@bjacob
Copy link

bjacob commented May 11, 2020

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This is a very common thing to do, particularly in float matrix multiplication kernels.

Example.

This should be available for all multiplication instructions, including any multiply-add instructions if added to the spec. Float and integer. This will map directly to the corresponding instructions on ARM and will be implemented on x86 by using a broadcast instruction into a temporary vector.

Rationale for this programming model in WebAsm SIMD:

  • It's more expressive w.r.t. what many applications need to do.
  • The fallback is efficient provided well ordered instructions in the generated code. By contrast, the current lack of this instruction forces the WebAsm source to use separate broadcast instructions, which make it essentially impossible for the generated code to be efficient.

See ARM benchmarks in this spreadsheet.
Row 30, NEON_64bit_GEMM_Float32_WithVectorDuplicatingScalar, is the float kernel that one can write without such instructions.
Row 31, NEON_64bit_GEMM_Float32_WithScalar, is the faster float kernel that one can write with such instructions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant