Tracking issue for WebAssembly SIMD support #74372

alexcrichton · 2020-07-15T17:00:30Z

I'm opening this as a tracking issue for the SIMD intrinsics in the {std,core}::arch::wasm32 module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the #![unstable] annotations to point to this issue to direct users here if they want to use these intrinsics.

The WebAssembly simd proposal is currently in "phase 3". I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type, v128, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.

For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in crates/core_arch/src/wasm32/simd128.rs. I've got a large refactoring and sync queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.

Currently the design principles for the SIMD intrinsics are:

Like the existing memory_size, memory_grow and unreachable intrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.
The type signature of each intrinsic is intended to match the textual description of each intrinsic
Each intrinsic has #[target_feature(enable = "simd128")] which forces them all to be unsafe
Some gotchas for specific intrinsics are:
- v128.const is exposed through a suite of const functions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.
- Instructions using lane indices, such as v8x16_shuffle and *_{extract,replace}_lane use const generics to represent constant arguments. This is different from x86_64 which uses the older #[rustc_args_required_const] attribute.
- Shuffles are provided for v16x8, v32x4, and v64x2 as conveniences instead of only providing v8x16_shuffle. All of them are implemented in terms of the v8x16.shuffle instruction, however.
There is a singular v128 type, not a type for each size of vector that intrinsics operate with
The extract_lane intrinsics return the value type associated with the intrinsic name, they do not all return i32 unlike the actual WebAssembly instruction. This means that we do not have extract_lane_s and extract_lane_u intrinsics because the compiler will select the appropriate one depending on the context.

It's important to note that clang has an implementation of these intrinsics in the wasm_simd128.h header. The current design of the Rust wasm32 module is different in that:

The prefix wasm_* isn't used.
Only one datatype, v128, is exposed instead of types for each size/kind of vector
Naming can be different depending on the intrinsic. For example clang has wasm_i16x8_load_8x8 and wasm_u16x8_load_8x8 while Rust has i16x8_load8x8_s and i16x8_load8x8_u.

Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.

How SIMD will be used

If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. #74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).

Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)

After rust-lang/stdarch#874 lands programs will simply use #[target_feature(enable = "...")] or RUSTFLAGS and everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will use v128.

Open Questions

A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:

Handle the difference between Clang and Rust. This could come in a number of forms such as accepting the difference or trying to unify the two. Either way the standard itself, unlike for x86, does not nor do I think will it provide a standard convention of how to expose these instructions in languages.
Audit and confirm the types of pointers in various *_load_* and *_store_* instructions. Primarily the instructions that load 64 bits (8x8, 16x4, ...) I'm unsure of on the types of their pointer arguments.
Figure out if the usage of const generics is ok for v8x16_shuffle and lane managment instructions.
Confirm the deviation of not having i8x16_extract_lane_s is ok (e.g. having i8x16_extract_lane returning i8 is all we need), same for i16x8.
Consider relaxing #[target_feature] "requires unsafe" rules for these WebAssembly intrinsics. Intrinsic like f32x4_splat have no fundamental reason they need to be unsafe. The only reason they're unsafe is because #[target_feature] is used on them to ensure that SIMD instructions are generated in LLVM.
Consider switching *_{any,all}_true to returning a bool
A general audit of intrinsic names and signatures to ensure they match the specification.

The text was updated successfully, but these errors were encountered:

This commit updates the src/stdarch submodule primarily to include rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD intrinsics and renamed WebAssembly atomics intrinsics. This is all unstable surface area of the standard library so the changes should be ok here. The SIMD updates also enable SIMD intrinsics to be used by any program any any time, yay! cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD intrinsics

…kruppe Update stdarch submodule This commit updates the src/stdarch submodule primarily to include rust-lang/stdarch#874 which updated and revamped WebAssembly SIMD intrinsics and renamed WebAssembly atomics intrinsics. This is all unstable surface area of the standard library so the changes should be ok here. The SIMD updates also enable SIMD intrinsics to be used by any program any any time, yay! cc rust-lang#74372, a tracking issue I've opened for the stabilization of SIMD intrinsics

est31 · 2020-12-25T02:16:00Z

Is there a way to enable wasm SIMD globally so that it can be adopted by the autovectorizer?

Emscripten seems to have such a feature: https://emscripten.org/docs/porting/simd.html

To enable SIMD, pass the -msimd128 flag at compile time. This will also turn on LLVM’s autovectorization passes, so no source modifications are necessary to benefit from SIMD.

jsheard · 2020-12-31T22:50:12Z

Is there a way to enable wasm SIMD globally so that it can be adopted by the autovectorizer?

Enabling simd128 via target-feature seems to do the trick, this simple case gets vectorized: https://godbolt.org/z/hcaxxf

CryZe · 2021-02-08T11:47:47Z

Consider switching *_{any,all}_true to returning a bool

They really should, it doesn't seem like the compiler can reason about it... though maybe that is because the compiler doesn't even inline any of the intrinsics at all, which seems like a decently big problem:

https://rust.godbolt.org/z/h6Y5PY

Amanieu · 2021-03-21T09:59:18Z

A lot of the load/store intrinsics feel redundant: the memory access itself is better done in Rust code rather than by the intrinsic itself.

Examples:

v128_load/v128_store are literally just *m and *m = a. Also, they are inconsistent with the Clang version which allows unaligned access while the Rust version doesn't.
v128_loadN_splat is just iNxM_splat(*x).
v128_loadN_lane is just iNxM_replace_lane(a, *val).
v128_storeN_lane is just *m = iNxM_extract_lane(v).
v128_loadN_zero doesn't have a direct intrinsic equivalent, but probably should have.

I would keep v128_loadNxM_{s,u} though since these do not directly translate to Rust code.

alexcrichton · 2021-03-21T17:36:37Z

I agree v128_{load,store} are redundant, especially because it's something Rust has to support anyway because you can take the address of any value at any time. I figured it was worth adding them for completeness, but you've got a good point that it may be best to use the ptr::{read,write}_unaligned intrinsics instead of just raw pointer reads/writes.

For v128_{load,store}N_lane as well as v128_loadN_zero I think unfortunately at this time need to expose the intrinsics. LLVM doesn't seem to optimize the equivalent paterns into the relevant instruction, although I'm sure that at some point in the future it will gain the ability to do so. I glanced at the discussions adding some of those intrinsics awhile back and I think the rationale was that compilers like V8 aren't great at pattern-matching multiple instructions to optimize the load/splat for example, and I think that ended up motivating the addition of the instructions. (I'm not 100% sure on this point though). In general though I think the idea is that compilers aren't always good enough to fuse the operations so there's a special instruction for it. (and compilers in this case I guess is both the one compiling wasm to native code but also LLVM compiling source code to wasm).

I've been adding most of the intrinsics so far but unfortunately I haven't had the opportunity to write a compiled-to-wasm thing which actually uses the instructions. In that sense I'm mostly shooting in the dark as to what the APIs should be. So far I've tried to maximize availability (ensuring there's a function-per-instruction, regardless of how silly it is to have) and stick close to the most standardized piece, the spec names/type signatures. I did this to stay in the spirit of the x86 intrinsics which are copied verbatim from the spec and we provide virtually no abstractions to make them nice to use (even in some cases where it would be easy to do so). Overall, though, I'm not sure if this is the best tradeoff for wasm. WebAssembly isn't the same as x86 in this regard, so we may get more bang for our buck by putting more thought and effort into what a usable API would be (at the cost of time for stabilization, of course).

alexcrichton · 2021-04-28T18:49:10Z

tl;dr; I'd like to ask if a @rust-lang/libs team member would be willing to
propose an FCP on this issue for stabilization.

Ok I think enough things have landed now that I'd like to propose that this
issue is considered for stabilization. I've read over the OP and it's still
quite accurate, the only difference is that the WebAssembly SIMD
proposal is now at phase 4 which is WebAssembly's "close to
stabilization". Additionally browsers will start shipping this feature ungated
in the future, with Chrome already scheduled to do so and also
implemented in
Firefox.
Consequently the instruction set that we're targeting is stable at this point
and I think we're in a position where we can stabilize the Rust intrinsics
themselves.

To recap, this tracking proposal is for v128-related functions in the
core::arch::wasm32
module (also available in
std of course). The intrinsics and v128 type in this module are intended to
expose the functionality of the WebAssembly simd
proposal which has both an English
description
as well as a proposed formal
specification.

The v128 type and the new instructions added to WebAssembly can be used very
similarly to existing CPU intrinsics that Rust has for x86 and x86_64.
Unlike x86_64, however, WebAssembly does not have a standard like the Intel
Intrinsics Guide or precedent in other languages of how to bind the SIMD support
in the language itself. Consequently a decision to stabilize these intrinsics in
Rust will be sort of blazing a new trail rather than following existing
precedent.

The design princicples for the SIMD intrinsics in the wasm32 module I'd
propose stabilizing are:

One type, v128, is exposed. This mirrors the underlying type added to
WebAssembly itself. We could alternatively add types like i32x4 instead of
just a singular v128 type, but this would require extra conversions to also
be defined and given the precedent of x86_64 SIMD support in Rust the arch
module's primary goal is "easiest for compilers to implement and the library
team to stabilize" instead of "easiest for developers to use".
Instructions are exposed with an unsafe function that has an appropriate
#[target_feature] annotation on it. WebAssembly is unlike other targets
where the unsafe here is not necessary. Unlike x86_64 which isn't
guaranteed to raise a CPU fault on invalid intrinsics executed, WebAssembly
won't even begin to execute code unless the entire module validates.
These intrinsics could be safe if not for the current design of the
#[target_feature] attributes, and ideally these would be made safe in a
future Rust release as a non-breaking change.
Instruction type signatures closely mirror the instruction's type signature
but use Rust native types where possible. For example i16x8_all_true returns
a bool instead of an i32 like the instruction does.
The v128.const instruction is exposed as a variety of constructors (see
below), but none of which require constant arguments. The compiler (LLVM) is
left to select the most appropriate instruction (or sequence of instructions)
for creating a vector. This may be v128.const, but it may also not be.
Instruction function names and type signatures are inspired by their
instruction name and instruction signature, but are not required to follow it.
WebAssembly instruction names follow WebAssembly conventions (e.g. only having
one i32 type instead of two i32 and u32 types) which aren't necessarily
conventions for Rust as well. Furthermore WebAssembly exposes types like a
Rust bool as an i32. This means that each instruction is considered in
isolation of how to best expose it to Rust.

Some general principles for naming intrinsics are:
- Conveniences are exposed when the instruction is extremely low level. For
  example u64x2 is exposed which is typically lowered to v128.const.
- Suffixes like _s and _u for instructions are baked into the funciton
  names to to indicate the signededness of the vector argument.
- Rust functions are provided for both signed and unsigned names of vectors
  (with corresponding argument types).
Unlike x86_64 where there are thousands of intrinsics that can't be
hand-verified and scrutinized, WebAssembly is exposing ~200 intrinsics which
is expected to be reasonable enough to bikeshed intrinsics individually and
find the best names. A table of how instructions are exposed is:

Wasm Instruction	Rust Function(s)
`v128.load`	`v128_load`
`v128.load8x8_s`	`i16x8_load_extend_i8x8`
`v128.load8x8_u`	`i16x8_load_extend_u8x8`
`v128.load16x4_s`	`i32x4_load_extend_i16x4`
`v128.load16x4_u`	`i32x4_load_extend_u16x4`
`v128.load32x2_s`	`i64x2_load_extend_i32x2`
`v128.load32x2_u`	`i64x2_load_extend_u32x2`
`v128.load8_splat`	`v128_load8_splat`
`v128.load16_splat`	`v128_load16_splat`
`v128.load32_splat`	`v128_load32_splat`
`v128.load64_splat`	`v128_load64_splat`
`v128.load32_zero`	`v128_load32_zero`
`v128.load64_zero`	`v128_load64_zero`
`v128.store`	`v128_store`
`v128.load8_lane`	`v128_load8_lane`
`v128.load16_lane`	`v128_load16_lane`
`v128.load32_lane`	`v128_load32_lane`
`v128.load64_lane`	`v128_load64_lane`
`v128.store8_lane`	`v128_store8_lane`
`v128.store16_lane`	`v128_store16_lane`
`v128.store32_lane`	`v128_store32_lane`
`v128.store64_lane`	`v128_store64_lane`
`v128.const`	`i8x16`, `u8x16`, `i16x8`, `u16x8`, `i32x4`, `u32x4`, `i64x2`, `u64x2`, `f32x4`, `f64x2`
`i8x16.shuffle`	`i8x16_shuffle`, `i16x8_shuffle`, `i32x4_shuffle`, `i64x2_shuffle`
`i8x16.extract_lane_s`	`i8x16_extract_lane`
`i8x16.extract_lane_u`	`u8x16_extract_lane`
`i16x8.extract_lane_s`	`i16x8_extract_lane`
`i16x8.extract_lane_u`	`u16x8_extract_lane`
`i32x4.extract_lane`	`i32x4_extract_lane`, `u32x4_extract_lane`
`i64x2.extract_lane`	`i64x2_extract_lane`, `u64x2_extract_lane`
`f32x4.extract_lane`	`f32x4_extract_lane`
`f64x2.extract_lane`	`f64x2_extract_lane`
`i8x16.replace_lane`	`i8x16_replace_lane`, `u8x16_replace_lane`
`i16x8.replace_lane`	`i16x8_replace_lane`, `u16x8_replace_lane`
`i32x4.replace_lane`	`i32x4_replace_lane`, `u32x4_replace_lane`
`i64x2.replace_lane`	`i64x2_replace_lane`, `u64x2_replace_lane`
`f32x4.replace_lane`	`f32x4_replace_lane`
`f64x2.replace_lane`	`f64x2_replace_lane`
`i8x16.swizzle`	`i8x16_swizzle`
`i8x16.splat`	`i8x16_splat`, `u8x16_splat`
`i16x8.splat`	`i16x8_splat`, `u16x8_splat`
`i32x4.splat`	`i32x4_splat`, `u32x4_splat`
`i64x2.splat`	`i64x2_splat`, `u64x2_splat`
`f32x4.splat`	`f32x4_splat`
`f64x2.splat`	`f64x2_splat`
`i8x16.eq`	`i8x16_eq`
`i8x16.ne`	`i8x16_ne`
`i8x16.lt_s`	`i8x16_lt`
`i8x16.lt_u`	`u8x16_lt`
`i8x16.gt_s`	`i8x16_gt`
`i8x16.gt_u`	`u8x16_gt`
`i8x16.le_s`	`i8x16_le`
`i8x16.le_u`	`u8x16_le`
`i8x16.ge_s`	`i8x16_ge`
`i8x16.ge_u`	`u8x16_ge`
`i16x8.eq`	`i16x8_eq`
`i16x8.ne`	`i16x8_ne`
`i16x8.lt_s`	`i16x8_lt`
`i16x8.lt_u`	`u16x8_lt`
`i16x8.gt_s`	`i16x8_gt`
`i16x8.gt_u`	`u16x8_gt`
`i16x8.le_s`	`i16x8_le`
`i16x8.le_u`	`u16x8_le`
`i16x8.ge_s`	`i16x8_ge`
`i16x8.ge_u`	`u16x8_ge`
`i32x4.eq`	`i32x4_eq`
`i32x4.ne`	`i32x4_ne`
`i32x4.lt_s`	`i32x4_lt`
`i32x4.lt_u`	`u32x4_lt`
`i32x4.gt_s`	`i32x4_gt`
`i32x4.gt_u`	`u32x4_gt`
`i32x4.le_s`	`i32x4_le`
`i32x4.le_u`	`u32x4_le`
`i32x4.ge_s`	`i32x4_ge`
`i32x4.ge_u`	`u32x4_ge`
`i64x2.eq`	`i64x2_eq`
`i64x2.ne`	`i64x2_ne`
`i64x2.lt_s`	`i64x2_lt`
`i64x2.gt_s`	`i64x2_gt`
`i64x2.le_s`	`i64x2_le`
`i64x2.ge_s`	`i64x2_ge`
`f32x4.eq`	`f32x4_eq`
`f32x4.ne`	`f32x4_ne`
`f32x4.lt`	`f32x4_lt`
`f32x4.gt`	`f32x4_gt`
`f32x4.le`	`f32x4_le`
`f32x4.ge`	`f32x4_ge`
`f64x2.eq`	`f64x2_eq`
`f64x2.ne`	`f64x2_ne`
`f64x2.lt`	`f64x2_lt`
`f64x2.gt`	`f64x2_gt`
`f64x2.le`	`f64x2_le`
`f64x2.ge`	`f64x2_ge`
`v128.not`	`v128_not`
`v128.and`	`v128_and`
`v128.andnot`	`v128_andnot`
`v128.or`	`v128_or`
`v128.xor`	`v128_xor`
`v128.bitselect`	`v128_bitselect`
`v128.any_true`	`v128_any_true`
`i8x16.abs`	`i8x16_abs`
`i8x16.neg`	`i8x16_neg`
`i8x16.popcnt`	`i8x16_popcnt`
`i8x16.all_true`	`i8x16_all_true`
`i8x16.bitmask`	`i8x16_bitmask`
`i8x16.narrow_i16x8_s`	`i8x16_narrow_i16x8`
`i8x16.narrow_i16x8_u`	`u8x16_narrow_i16x8`
`i8x16.shl`	`i8x16_shl`
`i8x16.shr_s`	`i8x16_shr`
`i8x16.shr_u`	`u8x16_shr`
`i8x16.add`	`i8x16_add`
`i8x16.add_sat_s`	`i8x16_add_sat`
`i8x16.add_sat_u`	`u8x16_add_sat`
`i8x16.sub`	`i8x16_sub`
`i8x16.sub_sat_s`	`i8x16_sub_sat`
`i8x16.sub_sat_u`	`u8x16_sub_sat`
`i8x16.min_s`	`i8x16_min`
`i8x16.min_u`	`u8x16_min`
`i8x16.max_s`	`i8x16_max`
`i8x16.max_u`	`u8x16_max`
`i8x16.avgr_u`	`u8x16_avgr`
`i16x8.extadd_pairwise_i8x16_s`	`i16x8_extadd_pairwise_i8x16`
`i16x8.extadd_pairwise_i8x16_u`	`i16x8_extadd_pairwise_u8x16`
`i16x8.abs`	`i16x8_abs`
`i16x8.neg`	`i16x8_neg`
`i16x8.qmulr_sat_s`	`i16x8_q15mulr_sat`
`i16x8.all_true`	`i16x8_all_true`
`i16x8.bitmask`	`i16x8_bitmask`
`i16x8.narrow_i32x4_s`	`i16x8_narrow_i32x4`
`i16x8.narrow_i32x4_u`	`u16x8_narrow_i32x4`
`i16x8.extend_low_i8x16_s`	`i16x8_extend_low_i8x16`
`i16x8.extend_high_i8x16_s`	`i16x8_extend_high_i8x16`
`i16x8.extend_low_i8x16_u`	`i16x8_extend_low_u8x16`
`i16x8.extend_high_i8x16_u`	`i16x8_extend_high_u8x16`
`i16x8.shl`	`i16x8_shl`
`i16x8.shr_s`	`i16x8_shr`
`i16x8.shr_u`	`u16x8_shr`
`i16x8.add`	`i16x8_add`
`i16x8.add_sat_s`	`i16x8_add_sat`
`i16x8.add_sat_u`	`u16x8_add_sat`
`i16x8.sub`	`i16x8_sub`
`i16x8.sub_sat_s`	`i16x8_sub_sat`
`i16x8.sub_sat_u`	`u16x8_sub_sat`
`i16x8.mul`	`i16x8_mul`
`i16x8.min_s`	`i16x8_min`
`i16x8.min_u`	`u16x8_min`
`i16x8.max_s`	`i16x8_max`
`i16x8.max_u`	`u16x8_max`
`i16x8.avgr_u`	`u16x8_avgr`
`i16x8.extmul_low_i8x16_s`	`i16x8_extmul_low_i8x16`
`i16x8.extmul_high_i8x16_s`	`i16x8_extmul_high_i8x16`
`i16x8.extmul_low_i8x16_u`	`i16x8_extmul_low_u8x16`
`i16x8.extmul_high_i8x16_u`	`i16x8_extmul_high_u8x16`
`i32x4.extadd_pairwise_i16x8_s`	`i32x4_extadd_pairwise_i16x8`
`i32x4.extadd_pairwise_i16x8_u`	`i32x4_extadd_pairwise_u16x8`
`i32x4.abs`	`i32x4_abs`
`i32x4.neg`	`i32x4_neg`
`i32x4.all_true`	`i32x4_all_true`
`i32x4.bitmask`	`i32x4_bitmask`
`i32x4.extend_low_i16x8_s`	`i32x4_extend_low_i16x8`
`i32x4.extend_high_i16x8_s`	`i32x4_extend_high_i16x8`
`i32x4.extend_low_i16x8_u`	`i32x4_extend_low_u16x8`
`i32x4.extend_high_i16x8_u`	`i32x4_extend_high_u16x8`
`i32x4.shl`	`i32x4_shl`
`i32x4.shr_s`	`i32x4_shr`
`i32x4.shr_u`	`u32x4_shr`
`i32x4.add`	`i32x4_add`
`i32x4.sub`	`i32x4_sub`
`i32x4.mul`	`i32x4_mul`
`i32x4.min_s`	`i32x4_min`
`i32x4.min_u`	`u32x4_min`
`i32x4.max_s`	`i32x4_max`
`i32x4.max_u`	`u32x4_max`
`i32x4.dot_i16x8_s`	`i32x4_dot_i16x8`
`i32x4.extmul_low_i16x8_s`	`i32x4_extmul_low_i16x8`
`i32x4.extmul_high_i16x8_s`	`i32x4_extmul_high_i16x8`
`i32x4.extmul_low_i16x8_u`	`i32x4_extmul_low_u16x8`
`i32x4.extmul_high_i16x8_u`	`i32x4_extmul_high_u16x8`
`i64x2.abs`	`i64x2_abs`
`i64x2.neg`	`i64x2_neg`
`i64x2.all_true`	`i64x2_all_true`
`i64x2.bitmask`	`i64x2_bitmask`
`i64x2.extend_low_i32x4_s`	`i64x2_extend_low_i32x4`
`i64x2.extend_high_i32x4_s`	`i64x2_extend_high_i32x4`
`i64x2.extend_low_i32x4_u`	`i64x2_extend_low_u32x4`
`i64x2.extend_high_i32x4_u`	`i64x2_extend_high_u32x4`
`i64x2.shl`	`i64x2_shl`
`i64x2.shr_s`	`i64x2_shr`
`i64x2.shr_u`	`u64x2_shr`
`i64x2.add`	`i64x2_add`
`i64x2.sub`	`i64x2_sub`
`i64x2.mul`	`i64x2_mul`
`i64x2.extmul_low_i32x4_s`	`i64x2_extmul_low_i32x4`
`i64x2.extmul_high_i32x4_s`	`i64x2_extmul_high_i32x4`
`i64x2.extmul_low_i32x4_u`	`i64x2_extmul_low_u32x4`
`i64x2.extmul_high_i32x4_u`	`i64x2_extmul_high_u32x4`
`f32x4.ceil`	`f32x4_ceil`
`f32x4.floor`	`f32x4_floor`
`f32x4.trunc`	`f32x4_trunc`
`f32x4.nearest`	`f32x4_nearest`
`f32x4.abs`	`f32x4_abs`
`f32x4.neg`	`f32x4_neg`
`f32x4.sqrt`	`f32x4_sqrt`
`f32x4.add`	`f32x4_add`
`f32x4.sub`	`f32x4_sub`
`f32x4.mul`	`f32x4_mul`
`f32x4.div`	`f32x4_div`
`f32x4.min`	`f32x4_min`
`f32x4.max`	`f32x4_max`
`f32x4.pmin`	`f32x4_pmin`
`f32x4.pmax`	`f32x4_pmax`
`f64x2.ceil`	`f64x2_ceil`
`f64x2.floor`	`f64x2_floor`
`f64x2.trunc`	`f64x2_trunc`
`f64x2.nearest`	`f64x2_nearest`
`f64x2.abs`	`f64x2_abs`
`f64x2.neg`	`f64x2_neg`
`f64x2.sqrt`	`f64x2_sqrt`
`f64x2.add`	`f64x2_add`
`f64x2.sub`	`f64x2_sub`
`f64x2.mul`	`f64x2_mul`
`f64x2.div`	`f64x2_div`
`f64x2.min`	`f64x2_min`
`f64x2.max`	`f64x2_max`
`f64x2.pmin`	`f64x2_pmin`
`f64x2.pmax`	`f64x2_pmax`
`i32x4.trunc_sat_f32x4_s`	`i32x4_trunc_sat_f32x4`
`i32x4.trunc_sat_f32x4_u`	`u32x4_trunc_sat_f32x4`
`f32x4.convert_i32x4_s`	`f32x4_convert_i32x4`
`f32x4.convert_i32x4_u`	`f32x4_convert_u32x4`
`i32x4.trunc_sat_f64x2_s_zero`	`i32x4_trunc_sat_f64x2_zero`
`i32x4.trunc_sat_f64x2_u_zero`	`u32x4_trunc_sat_f64x2_zero`
`f64x2.convert_low_i32x4_s`	`f64x2_convert_low_i32x4`
`f64x2.convert_low_i32x4_u`	`f64x2_convert_low_u32x4`
`f32x4.demote_f64x2_zero`	`f32x4_demote_f64x2_zero`
`f64x2.promote_low_f32x4`	`f64x2_promote_low_f32x4`

It's worth nothing that Clang has a header file for WebAssembly
intrinsics in the same way it has one for x86_64. The Rust names do not
precisely match the names exposed in C. The first shift is that Rust functions
are not prefixed with wasm_ (since Rust has a wasm32 module unlike C does).
Even accounting for this difference Rust has other naming differences:

wasm_i16x8_load8x8 vs i16x8_load_extend_i8x8 (and related)
wasm_u16x8_load8x8 vs i16x8_load_extend_u8x8 (and related)
wasm_i8x16_make vs i8x16 (and related)
wasm_i8x16_const vs i8x16 (and related)
C lacks a wasm_u8x16_splat function (only has i8x16)
C lacks a wasm_u8x16_replace_lane function (only has i8x16)
C lacks a wasm_u32x4_extract_lane function (only has i32x4)
wasm_u32x4_extend_low_u16x8 vs i32x4_extend_low_u16x8 (and related)
wasm_u16x8_extmul_high_u8x16 vs i16x8_extmul_high_u8x16 (and related)

The intention behind these design decisions is that all the functionality of the
proposal should be exposed in Rust, but at the same time we give ourselves
wiggle room where possible to add conveniences. The conveniences aren't super
principled other than evaluating intrinsics one-by-one. There are other possible
points naturally on the design spectrum for this module, for example providing
many types like u8x16 with first-class methods and such. Functions taking
memory arguments could arguably take safe references instead of raw
pointers as well. As @Amanieu pointed
out
other intrinsics are unlikely to ever get used or can otherwise be trivially
composed from other intrinsics (relying on LLVM to optimize, which it doesn't
always do just yet). My hope is that by getting a few more eyeballs on this we
can figure out a good balance for Rust and what we'd like the intrinsics to be.

Implementation Status

It's worth touching on the implementation status of this proposal currently as
well. At this time the stdarch
repository
has an implementation for all of the above intrinsics. After
#84654 lands LLVM will support all
instructions except i64x2.abs (just a minor LLVM issue) and the final update
to stdarch to match LLVM will be in
rust-lang/stdarch#1131. Note that actually updating the
stdarch submodule in rust-lang/rust happening in
#83278 is blocked. This means that the
current documentation for
wasm does not match this
proposal, despite this proposal all being implemented.

It's also worth mentioning that this is all a relatively new feature in LLVM.
LLVM 12 an prior do not support the WebAssembly simd proposal as-is. Just before
reaching stage 4 a number of opcodes were renumbered, so LLVM versions 12 and
prior will produce invalid WebAssembly binaries using the simd feature. A number
of backports have been performed onto Rust's LLVM 12 fork so the Rust fork of
LLVM fully supports WebAssembly SIMD, but this means that until LLVM 13 is
released it's unlikely that cross-language-LTO or similar interop will be
supported between C and Rust.

CryZe · 2021-04-28T19:48:38Z

i8x16, u8x16, i16x8, u16x8, i32x4, u32x4, i64x2, u64x2, f32x4, f64x2

I'm still somewhat skeptical of those being the names for the v128.const constructor functions as that means no such types could ever live in this module, and idk if we truly want to block off that possibility forever. I agree that for now just a v128 type and these free standing functions is fine, but maybe those constructor functions could be named with a _const suffix like the actual instruction (or e.g. _make like C seems to use).

alexcrichton · 2021-04-29T14:34:09Z

There was a small bikeshed here about those names where _const was a bit of a misnomer since the arguments aren't required to be constant. Other prefixes like _make or _new seem fine to me!

I don't think, though, that this prevents us from having dedicated types for each lane width because functions go into the value namespace and structs go into the type namespace.

CryZe · 2021-04-29T15:23:53Z

Seems like it prevents the type from being a tuple struct though: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40b1e2abd2ade54dd598db826ed594bf

But as long as using a non-tuple struct for those types doesn't cause any problems such as repr(simd) or so not working anymore, then I guess it's fine.

joshtriplett · 2021-05-05T16:36:21Z

I think it's appropriate to start checking for consensus here, and we can use concerns to track blockers.

@rfcbot merge

stdarch needs an update to a version matching the proposal:
@rfcbot concern update-stdarch

I'd love to see #[target_feature] made safe on wasm targets, if that's feasible and not excessively difficult, to avoid having to make a transition in the future:
@rfcbot concern wasm-safe-target-feature

rfcbot · 2021-05-05T16:36:22Z

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

~~load/store intrinsics~~ resolved by Tracking issue for WebAssembly SIMD support #74372 (comment)
~~update-stdarch~~ resolved by Tracking issue for WebAssembly SIMD support #74372 (comment)
~~wasm-safe-target-feature~~ resolved by Tracking issue for WebAssembly SIMD support #74372 (comment)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

BurntSushi · 2021-05-05T18:41:58Z

@alexcrichton Amazing work! I think this all pretty much LGTM but I do have a few questions:

Firstly, it looks like the target feature name is just simd128, but that seems... a bit generic? I guess I would expect it to include wasm in it or something more specific than just simd128.

Secondly, with respect to safe target feature, I see this in the docs today:

Unlike x86_64, however, WebAssembly does not currently have dynamic detection at runtime as to whether SIMD is supported (this is one of the motivators for the conditional sections proposal, but that is still pretty early days).

If it does get dynamic detection, does that change the discussion around a safe target feature?

Thirdly and lastly, have we considered using a naming scheme that matches the WASM spec as closely as possible? e.g., We invent names where necessary, but otherwise do a mechanical translation of things (e.g., . -> _). I only suggest this because I've always like this property of our std::arch::x86_64 module in that you just need to take the pre-existing name, search it and ge the result you want.

alexcrichton · 2021-05-05T19:21:41Z

Good questions and thanks for reading over this!

The name simd128 is currently chosen because it matches LLVM's feature name for this. Unlike CPU features for x86 WebAssembly primarily has proposal names to go by. The proposal itself is just called "simd", though, which wouldn't really help much here. I think though that it's already somewhat implied that target_feature feature names are not scoped to their own targets, e.g. sse2 doesn't really imply x86_64 except by historical association. That being said I don't think there's an issue renaming it to wasm-simd128 or something like that, but I figured that it wouldn't be necessary due to the lack of scoping for other platforms.

Currently WebAssembly doesn't have any form of dynamic detectino for supported features, and even on the horizon I'm not sure there's really anything viable for adding this in a form that looks like x86. The closest equivalent for WebAssembly is conditional sections but afaik that proposal is sort of dead in the water right now and doesn't have a future. There are possible alternatives of a roughly similar shape, but it's all about selection at compile time instead of runtime. Basically I think it's tough to answer precisely what would happen if wasm gets dynamic detection because it's unclear to me what it means to get dynamic detection. My guess, though, is that nothing will ever be unsafe. The whole point of #[target_feature] and unsafe is that we don't know what happens if a CPU executes code it doesn't understand, but the whole point of WebAssembly is this never happens. For this reason I think it's safe to say that for WebAssembly specifically the unsafe-ty of #[target_feature] is never necessary. (even in to the future)

For naming, that is indeed an alternative! I actually originally implemented that. I ended up coming around to Rust-specific names for a few reasons though:

If we match instructions exactly, then i8x16.shuffle is an interesting case. Only providing that is pretty unergonomic when we could easily provide i64x2.shuffle as well (and other variants). Basically some instructions are intended to have multiple kinds of high-level operations map down to them, so I think it makes sense to provide all those high level operations. This means that we would already buy into mulitple-functions-per-instruction. Naming here isn't too too hard, though, we can still roughly match the instruction.
Next up there was i8x16.extrace_lane_u. WebAssembly doesn't have a u8 type so the _u and _s instructions are sign or zero-extending the 8-bit lane into a 32-bit integer (as a WebAssembly i32). Binding these instructions could still be done precisely by their instruction name itself, but if we have _u and _s for 8/16-bit integers it's sort of odd that we don't also have it for u32/u64. It makes sense from an instruction set level, but not necessarily from a Rust language level.
Overall, the WebAssembly instructions are designed (rightfully so) to be consistent with themselves and the rest of the WebAssembly specification, not high-level languages. This means that instruction names are specifically chosen for their consistency with other instructions, not for the ergonomics or understand of just reading the instruction. For example i8x16.shuffle has nothing really to do with i8x16 as a type, it works with all other lane types. Instructions like i16x8.extadd_pairwise_i8x16_u are somewhat ambiguous as to where the _u applies (whereas in Rust i16x8_extadd_pairwise_u8x16 is more clear). Another good example of this is Names of load instructions WebAssembly/simd#297 where instructions were renamed to be more consistent with themselves, and from my reading that was mostly the only motivation involved in doing so).

I originally thought 1:1 with instructions would be the way to go. After using it a bit, though, and getting other feedback, I've personally come to the conclusion that the best option is to design the names specifically for Rust. Given the size of the instruction set I think this is feasible (unlike x86) and given the downsides of matching wasm exactly I think we have a lot to gain in terms of readability of code itself. What I don't necessarily have a great gauge on is how folks will typically write SIMD-accelerated wasm code. If they start from desired instructions and work backwards I think renaming things could impede this. If they start from the wasm32 module and figure out what's available I think renaming things will help since conveniences will be provided and names are sometimes more understandable.

That's at least my thinking on the names so far. I don't really mind implementing either way, and I don't personally lean super strongly either way. I'm just currently leaning more towards custom names from Rust. (it's worth pointing out the custom Rust names more closely match what C is doing currently in clang as well, which is to diverge from the spec and use nicer prefixes and have conveniences)

BurntSushi · 2021-05-05T21:12:38Z

RE simd128: It's a good point about sse2, it isn't namespaced. I think the thing there is that sse2 is kinda weird in and of itself, where simd is just kind of a generic thing. I would personally like wasm-simd128 better, even though I know it's a little longer to type.

RE dynamic detection: I think if you're certain that the intrinsics won't become unsafe to call some day, then I think that assuages the root of my concern there.

RE naming:

and I don't personally lean super strongly either way.

I think I'm with you there, and personally, absent other data, I'm inclined to defer to your experience actually using the routines. I think we're in agreement about the trade offs here, and in particular, that having precisely matching names is a nice property. But perhaps the only reason I appreciate it so much for x86 is indeed because of its size, which doesn't apply here. And if the C folks are devising their own names too, then I guess we might as well too.

One nice thing to do (not a stabilization blocker of course) would be to add the WASM spec name instruction to the docs of each intrinsic and add a doc alias for them. I think that might actually get us the best of both worlds. Then folks could type in the WASM spec name and get back the Rust equivalent.

workingjubilee · 2021-05-06T03:57:20Z

re: wasm-simd128
or wasimd128, webassembly-fixed-width-simd, wasm-v128...
I agree that simd128 seems like it invites a name collision without qualification. Not today, but perhaps in a decade. I would anticipate, since all compiler invocations are implicitly scoped by target, that the concern of simd128 and it having a possible name collision would be on the diagnostic side and reading code, or mixed invocations of a wasm + native project and possibly accidentally setting a value for one that was not intended to be set for the other. So not so much a strict technical issue, only clarity, and only due to an abundance of caution.

re: syntax, name-matching, etc:
Importantly, Rust's wasm intrinsics will be "first to field" for many users. Intrinsic functions offered by Intel or Arm exist to thinly wrap the more rough-hewn APIs of raw assembly, and they have an active and expansive userbase who would expect them to be named the same in other languages if they're supported at all. But here, we are in fact wrapping a "raw assembly" (kinda) API in new functions, and so I believe Rust is entitled to deviate where it likes, because Rust is less in the role of a compiler implementing Intel or Arm's intrinsics word for word, but rather is "Intel" or "Arm".

re: "Is this ready?"
Hm, it's more a compiler-level concern technically, but is this blocking? #80108

CryZe · 2021-05-26T07:40:24Z

Should the bitmask instructions possibly return u8 / u16 / u32 based on the lane count? At the moment they all return i32 which is of course the type on "machine level", but since the intrinsics all take appropriate Rust integer types, it seems like the bitmask instructions don't properly reflect that. I noticed this because actually casting to u16 on a i8x16_bitmask emits an unnecessary mask & 0xFFFF.

Oh, also I've started implementing a bunch of WASM SIMD for various crates tiny-skia, hashbrown, memchr, ... and I've noticed that by default our naming convention uses iNxM for all the intrinsics where the sign doesn't matter. This has been really confusing. So for the bitmask for example you have to use i8x16_bitmask as u8x16_bitmask doesn't exist, even though the sign doesn't actually matter. I feel like there should either also be a u8x16_bitmask that is just an alias (maybe at least a docs alias) or for instructions where there's no real notion of a sign, we should prefer the unsigned naming. (Actually maybe this example isn't great as the bitmask intrinsic cares about the "high bit" of each lane, which you could interpret as the sign bit depending on what you are doing)

alexcrichton · 2021-05-26T19:41:20Z

Returning a smaller type makes sense to me, I can work on implementing that. You're thinking of u16 for i8x16_bitmask and u8 for all other types presumably, right?

For the naming this was indeed one aspect I was worried about. I was worried that there was no u32x4_add since it was the same as i32x4_add functionally. Personally I would be ok adding u32x4_add as a pub use alias of i32x4_add to fix this, and doing that for all types where the signed/unsigned interpretation is the same. I'd be curious to hear the feedback of others though! In any case this is easy enough to implement backwards compatibly too.

Also thanks for implementing these optimizations! Do you have some PRs to show as examples of ergonomics and such?

This commit updates the compiler's handling of the `#[target_feature]` attribute when applied to functions on WebAssembly-based targets. The compiler in general requires that any functions with `#[target_feature]` are marked as `unsafe` as well, but this commit relaxes the restriction for WebAssembly targets where the attribute can be applied to safe functions as well. The reason this is done is that the motivation for this feature of the compiler is not applicable for WebAssembly targets. In general the `#[target_feature]` attribute is used to enhance target CPU features enabled beyond the basic level for the rest of the compilation. If done improperly this means that your program could execute an instruction that the CPU you happen to be running on does not understand. This is considered undefined behavior where it is unknown what will happen (e.g. it's not a deterministic `SIGILL`). For WebAssembly, however, the target is different. It is not possible for a running WebAssembly program to execute an instruction that the engine does not understand. If this were the case then the program would not have validated in the first place and would not run at all. Even if this were allowed in some hypothetical future where engines have some form of runtime feature detection (which they do not right now) any implementation of such a feature would generate a trap if a module attempts to execute an instruction the module does not understand. This deterministic trap behavior would still not fall into the category of undefined behavior because the trap is deterministic. For these reasons the `#[target_feature]` attribute is now allowed on safe functions, but only for WebAssembly targets. This notably enables the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be marked as safe generally instead of today where they're all `unsafe` due to the historical implementation of `#[target_feature]` in the compiler.

rfcbot · 2021-05-30T22:02:41Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

…sm, r=joshtriplett rustc: Allow safe #[target_feature] on wasm This commit updates the compiler's handling of the `#[target_feature]` attribute when applied to functions on WebAssembly-based targets. The compiler in general requires that any functions with `#[target_feature]` are marked as `unsafe` as well, but this commit relaxes the restriction for WebAssembly targets where the attribute can be applied to safe functions as well. The reason this is done is that the motivation for this feature of the compiler is not applicable for WebAssembly targets. In general the `#[target_feature]` attribute is used to enhance target CPU features enabled beyond the basic level for the rest of the compilation. If done improperly this means that your program could execute an instruction that the CPU you happen to be running on does not understand. This is considered undefined behavior where it is unknown what will happen (e.g. it's not a deterministic `SIGILL`). For WebAssembly, however, the target is different. It is not possible for a running WebAssembly program to execute an instruction that the engine does not understand. If this were the case then the program would not have validated in the first place and would not run at all. Even if this were allowed in some hypothetical future where engines have some form of runtime feature detection (which they do not right now) any implementation of such a feature would generate a trap if a module attempts to execute an instruction the module does not understand. This deterministic trap behavior would still not fall into the category of undefined behavior because the trap is deterministic. For these reasons the `#[target_feature]` attribute is now allowed on safe functions, but only for WebAssembly targets. This notably enables the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be marked as safe generally instead of today where they're all `unsafe` due to the historical implementation of `#[target_feature]` in the compiler.

…, r=joshtriplett rustc: Allow safe #[target_feature] on wasm This commit updates the compiler's handling of the `#[target_feature]` attribute when applied to functions on WebAssembly-based targets. The compiler in general requires that any functions with `#[target_feature]` are marked as `unsafe` as well, but this commit relaxes the restriction for WebAssembly targets where the attribute can be applied to safe functions as well. The reason this is done is that the motivation for this feature of the compiler is not applicable for WebAssembly targets. In general the `#[target_feature]` attribute is used to enhance target CPU features enabled beyond the basic level for the rest of the compilation. If done improperly this means that your program could execute an instruction that the CPU you happen to be running on does not understand. This is considered undefined behavior where it is unknown what will happen (e.g. it's not a deterministic `SIGILL`). For WebAssembly, however, the target is different. It is not possible for a running WebAssembly program to execute an instruction that the engine does not understand. If this were the case then the program would not have validated in the first place and would not run at all. Even if this were allowed in some hypothetical future where engines have some form of runtime feature detection (which they do not right now) any implementation of such a feature would generate a trap if a module attempts to execute an instruction the module does not understand. This deterministic trap behavior would still not fall into the category of undefined behavior because the trap is deterministic. For these reasons the `#[target_feature]` attribute is now allowed on safe functions, but only for WebAssembly targets. This notably enables the wasm-SIMD intrinsics proposed for stabilization in rust-lang#74372 to be marked as safe generally instead of today where they're all `unsafe` due to the historical implementation of `#[target_feature]` in the compiler.

This is a follow-up from rust-lang/rust#74372 which has finished FCP for the stabilization of wasm intrinsics. This marks them all stable, as-is and additionally marks the functions which create integer vectors as `const`-stable as well. The only remaining unstable bits are that `f32x4` and `f64x2` are `const`-unstable. Mostly just because I couldn't figure out how to make them `const`-stable.

alexcrichton · 2021-06-10T19:42:30Z

I've posted the final PR for stabilization to #86204

std: Stabilize wasm simd intrinsics This commit performs two changes to stabilize Rust support for WebAssembly simd intrinsics: * The stdarch submodule is updated to pull in rust-lang/stdarch#1179. * The `wasm_target_feature` feature gate requirement for the `simd128` feature has been removed, stabilizing the name `simd128`. This should conclude the FCP started on rust-lang#74372 and... Closes rust-lang#74372

newpavlov · 2021-07-29T19:15:15Z

What will happen if code dependent on SIMD intrinsics will be compiled without the necessary target feature being enabled? Will compiler generate respective SIMD instructions no questions asked (well, it will not inline them, but it's not important for the question)? Shouldn't it generate a compilation error? Otherwise generated WASM may silently become unusable in runtimes without SIMD support (imagine an incorrect change somewhere deep in a project's dependency tree).

alexcrichton · 2021-07-29T19:24:33Z

As the documentation indicates:

This means to generate a binary without SIMD you’ll need to avoid both options above plus calling into any intrinsics in this module.

where the "both options" are using -Ctarget-feature=+simd128 or #[target_feature(enable = "simd128")].

The compiler will always generate simd instructions if you call intrinsics in std::arch::wasm32, regardless of your compilation settings for the rest of the codegen unit. This is the same as for all other platform intrinsics, for example, in the x86 and x86_64 modules.

newpavlov · 2021-07-29T19:51:25Z

This is the same as for all other platform intrinsics, for example, in the x86 and x86_64 modules.

Yes, but on other platforms those intrinsics are unsafe, so there is a higher barrier for making mistake. Also having a respective instruction in a generated binary does not cause CPU to reject that binary, i.e. on x86 it's fine to have unchecked SIMD instructions in dead branches. In a certain sense using WASM SIMD intrinsic effectively changes target. So i wonder if the current handling of them is too lax.

Also #[target_feature(enable = "simd128")] is useless in my understanding. Since there is no runtime detection (and by your words there probably will not be one), WASM target features currently should be enabled for a whole project. I don't see a use case for enabling it only for selected functions.

est31 · 2021-07-29T20:30:53Z

Yes, but on other platforms those intrinsics are unsafe, so there is a higher barrier for making mistake.

Isn't this a violation of RFC 2045? It seems to me that RFC 2396's requirement to make calling functions with target features on functions that don't have target features unsafe has not been implemented?

Since there is no runtime detection (and by your words there probably will not be one)

@newpavlov the docs you linked mention runtime detection proposals.

newpavlov · 2021-07-29T21:07:06Z

the docs you linked mention runtime detection proposals.

Yes, I know and this is why I asked this question on reddit. But it looks like that @alexcrichton has acted under assumption that dynamic detection will not be added to WASM:

Currently WebAssembly doesn't have any form of dynamic detectino for supported features, and even on the horizon I'm not sure there's really anything viable for adding this in a form that looks like x86. The closest equivalent for WebAssembly is conditional sections but afaik that proposal is sort of dead in the water right now and doesn't have a future. There are possible alternatives of a roughly similar shape, but it's all about selection at compile time instead of runtime. Basically I think it's tough to answer precisely what would happen if wasm gets dynamic detection because it's unclear to me what it means to get dynamic detection.

BTW I want to discuss this point from the same comment a bit:

The whole point of #[target_feature] and unsafe is that we don't know what happens if a CPU executes code it doesn't understand, but the whole point of WebAssembly is this never happens.

In my understanding, unsafety of target_feature not caused only by concerns of CPU behavior (after all, outside of some very edge scenarios you usually will get a simple invalid opcode exception). It's probably not even the top reason, see this @gnzlbg's comment referenced in the target_feature RFC. Are we sure that those codegen concerns are not applicable to WASM?

alexcrichton · 2021-07-29T22:15:59Z

There's some more discussion on #84988 for reference, but my thoughts on this topic are:

Wasm engines, forever and all of time, will reject modules they do not understand. This means if you have a simd instruction in their an the wasm engine doesn't understand it, then a correct engine will never run any code and simply reject the module.
If dynamic feature detection is added then it will be added in such a way that the same modules can be executed differently on two engines (one for example with simd and without). If you have correctly configured your module and dynamic detection then the module that doesn't support simd will never see the simd code. This means it's never executed. If you incorrectly configured the dynamic feature detection then the engine that doesn't support simd will see simd instructions it does not understand. From the first point this means that the runtime will reject the module.
Some concerns on the original target_feature RFC were indeed connected to "assemblers and things can assume weird things". AFAIK those are generally theoretical concerns but still ones we take seriously. Again WebAssembly is different because the concern worried about here is that instructions for the wrong architecture leak into other parts of the program (e.g. an assembler realizes that you're using avx2 things so it assumes it can just keep using those, even though it's on a different path that's not supposed to execute when avx2 is not detected or something like that). As always wasm engines will either run the module or reject it entirely.

Basically there's gotchas and subtelties about generating modules in WebAssembly that do use simd, don't use simd, or try to dynamically detect simd (assuming some sort of future proposal), but these are all build time concerns. Whatever happens the final module will have some defined semantics based on the instructions its using (which are presumably all stable in the upstream wasm spec with clearly-defined semantics). This means that you'll either run the module on an engine understanding all instructions, in which case everything will behave exactly as expected, or you won't, in which case nothing will execute at all.

Overall there is no situation where a unknown wasm instruction is executed. There's situations your build isn't what you expect, but that's not UB that's a configuration issue.

newpavlov · 2021-07-29T22:50:21Z

Wasm engines, forever and all of time, will reject modules they do not understand.

IIUC with the features.supported proposal engines will accept feature blocks which they do not understand, but instead of parsing it, they will replace it as a whole with the unreachable instruction. It's effectively analogue of the ud2 instruction from x86. The question is now: is it allowed for safe Rust code to trigger architecture's "unreachable" instructions? AFAIK in the case of x86 the answer is no. Do we allow it for WASM?

Also I wonder how in the presence of runtime feature detection compiler will handle functions which use SIMD intrinsics, but do not provide a non-SIMD fallback. Will it simply wrap every intrinsic with a features.supported check and use unreachable in fallback branches?

There's situations your build isn't what you expect, but that's not UB that's a configuration issue.

Yes, it's not UB which would cause memory corruption, but it's still a real pitfall. Is it possible to make use of simd128 SIMD intrinsics without globally enabled target feature a compilation error? It would completely solve my concerns (at the very least until some kind dynamic detection will be added, but I really hope that we wil lget something similar to target restriction contexts before that). Also I think it would significantly reduce probability of unpleasant surprises caused by such misconfiguration, especially considering that its source may be deep in a dependency tree.

joshtriplett · 2021-07-30T02:05:24Z

On Thu, Jul 29, 2021 at 03:50:32PM -0700, Artyom Pavlov wrote: >Wasm engines, forever and all of time, will reject modules they do not understand. With the `features.supported` proposal engines will accept feature blocks which they do not understand, but instead of parsing it, they will replace the whole with the `unreachable` instruction. It's effectively analogue of the `ud2` instruction from x86. The question is now: is it allowed for safe Rust code to trigger "unreachable" instructions? AFAIK in the case of x86 the answer is no. Do we allow it for WASM?

As far as I know, WebAssembly's "unreachable" is more like [`unreachable!`](https://doc.rust-lang.org/std/macro.unreachable.html) (which is entirely safe), not like [`unreachable_unchecked`](https://doc.rust-lang.org/std/hint/fn.unreachable_unchecked.html) (which is unsafe and can cause UB).

est31 · 2021-07-30T10:14:05Z

Yeah one of the reasons to make target_feature unsafe was because some instruction sets might override the meaning of instructions or do other UB when they encounter instructions they don't support. On wasm, there is a safe guarantee that it'll trap so it's well defined.

Recommended in rust-lang#74372

alexcrichton added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jul 15, 2020

jonas-schievink added the A-simd Area: SIMD (Single Instruction Multiple Data) label Jul 15, 2020

alexcrichton mentioned this issue Jul 18, 2020

Update stdarch submodule #74482

Merged

Jerboas86 mentioned this issue Aug 25, 2020

SIMD Integration dimforge/nalgebra#502

Open

4 tasks

randombit mentioned this issue Apr 21, 2021

add cube hash implementation dfinity/internet-identity#100

Merged

alexcrichton mentioned this issue Apr 28, 2021

Penultimate wasm SIMD update (hopefully) rust-lang/llvm-project#101

Merged

joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 5, 2021

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels May 5, 2021

alexcrichton mentioned this issue May 6, 2021

rustc: Allow safe #[target_feature] on wasm #84988

Merged

rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label May 30, 2021

rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 30, 2021

alexcrichton mentioned this issue Jun 10, 2021

wasm: Mark simd intrinsics as stable rust-lang/stdarch#1179

Merged

apiraino removed the to-announce Announce this issue on triage meeting label Jun 10, 2021

alexcrichton mentioned this issue Jun 10, 2021

std: Stabilize wasm simd intrinsics #86204

Merged

bors closed this as completed in e05bb26 Jun 11, 2021

coastalwhite pushed a commit to coastalwhite/rust that referenced this issue Aug 5, 2023

Add doc aliases for all wasm intrinsics

e091c22

Recommended in rust-lang#74372

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for WebAssembly SIMD support #74372

Tracking issue for WebAssembly SIMD support #74372

alexcrichton commented Jul 15, 2020 •

edited

est31 commented Dec 25, 2020 •

edited

jsheard commented Dec 31, 2020

CryZe commented Feb 8, 2021

Amanieu commented Mar 21, 2021

alexcrichton commented Mar 21, 2021

alexcrichton commented Apr 28, 2021 •

edited

CryZe commented Apr 28, 2021 •

edited

alexcrichton commented Apr 29, 2021

CryZe commented Apr 29, 2021

joshtriplett commented May 5, 2021

rfcbot commented May 5, 2021 •

edited

BurntSushi commented May 5, 2021

alexcrichton commented May 5, 2021

BurntSushi commented May 5, 2021

workingjubilee commented May 6, 2021 •

edited

CryZe commented May 26, 2021 •

edited

alexcrichton commented May 26, 2021

rfcbot commented May 30, 2021

alexcrichton commented Jun 10, 2021

newpavlov commented Jul 29, 2021

alexcrichton commented Jul 29, 2021

newpavlov commented Jul 29, 2021 •

edited

est31 commented Jul 29, 2021

newpavlov commented Jul 29, 2021 •

edited

alexcrichton commented Jul 29, 2021

newpavlov commented Jul 29, 2021 •

edited

joshtriplett commented Jul 30, 2021 via email

est31 commented Jul 30, 2021

Tracking issue for WebAssembly SIMD support #74372

Tracking issue for WebAssembly SIMD support #74372

Comments

alexcrichton commented Jul 15, 2020 • edited

How SIMD will be used

Open Questions

est31 commented Dec 25, 2020 • edited

jsheard commented Dec 31, 2020

CryZe commented Feb 8, 2021

Amanieu commented Mar 21, 2021

alexcrichton commented Mar 21, 2021

alexcrichton commented Apr 28, 2021 • edited

Implementation Status

CryZe commented Apr 28, 2021 • edited

alexcrichton commented Apr 29, 2021

CryZe commented Apr 29, 2021

joshtriplett commented May 5, 2021

rfcbot commented May 5, 2021 • edited

BurntSushi commented May 5, 2021

alexcrichton commented May 5, 2021

BurntSushi commented May 5, 2021

workingjubilee commented May 6, 2021 • edited

CryZe commented May 26, 2021 • edited

alexcrichton commented May 26, 2021

rfcbot commented May 30, 2021

alexcrichton commented Jun 10, 2021

newpavlov commented Jul 29, 2021

alexcrichton commented Jul 29, 2021

newpavlov commented Jul 29, 2021 • edited

est31 commented Jul 29, 2021

newpavlov commented Jul 29, 2021 • edited

alexcrichton commented Jul 29, 2021

newpavlov commented Jul 29, 2021 • edited

joshtriplett commented Jul 30, 2021 via email

est31 commented Jul 30, 2021

alexcrichton commented Jul 15, 2020 •

edited

est31 commented Dec 25, 2020 •

edited

alexcrichton commented Apr 28, 2021 •

edited

CryZe commented Apr 28, 2021 •

edited

rfcbot commented May 5, 2021 •

edited

workingjubilee commented May 6, 2021 •

edited

CryZe commented May 26, 2021 •

edited

newpavlov commented Jul 29, 2021 •

edited

newpavlov commented Jul 29, 2021 •

edited

newpavlov commented Jul 29, 2021 •

edited