Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Use specific uN types for ImmLaneIdxM representations #256

Open
RReverser opened this issue Jun 13, 2020 · 10 comments
Open

Use specific uN types for ImmLaneIdxM representations #256

RReverser opened this issue Jun 13, 2020 · 10 comments

Comments

@RReverser
Copy link
Member

This is a follow-up to #242 and an extraction from the discussion in WebAssembly/threads#162 (comment).

Short version: could we define binary representations of ImmLaneIdx* via custom-length u* types? The WebAssembly grammar already supports arbitrary-sized integers, and normally post-validation step is used for things we'd like to extend or change in the future.

I'd argue that all usages of ImmLaneIdx* do not fall under this category, as they are in instructions that make sense only with index in the strict given range. So an invalid index should be considered malformed and caught at the parsing stage rather than via post-validation.

More specifically, I propose the following representations in BinarySIMD.md:

Production Type
ImmLaneIdx2 u1
ImmLaneIdx4 u2
ImmLaneIdx8 u3
ImmLaneIdx16 u4

Thoughts?

@tlively
Copy link
Member

tlively commented Jun 16, 2020

To be clear, you're just proposing a change to the boundary between an invalid and a malformed module, right? Not a change to the binary encoding?

Seems ok to me, but we should consider @binji's comment from the other thread:

I think you could handle it in the binary grammar, since you always know the lane type from the instruction opcode. But if we wanted to handle the syntax as we do with other instructions (grouping them by the kind of instruction they are, e.g. extract_lane, replace_lane) then we'd need to have a union at that point.

I would be in favor of leaving this to whomever (@dtig?) is writing and reviewing the formal spec to give them the flexibility to simplify things as they see fit.

@RReverser
Copy link
Member Author

To be clear, you're just proposing a change to the boundary between an invalid and a malformed module, right?

Yep.

@dtig
Copy link
Member

dtig commented Jun 18, 2020

I don't see a problem with this, also I think we already catch this as a validation failure in the V8 implementation even though it's not clearly specified here, as intuitively it seemed like the this should be a validation failure - though I can see how as it's not explicit, this can vary based on implementation. Regarding the snippet of the comment, we already need a union of these operations anyway because they don't cleanly fit into the semantics of other operations that are currently in the spec.

Though for the purposes of the overview readability I'd prefer that we retain the ImmLaneIdx* representation, and add more details about invalid indices, or maybe that's what @RReverser is proposing?

@binji
Copy link
Member

binji commented Jun 19, 2020

@dtig right, the change here would make the following a malformed module instead of an invalid module: (i32x4.extract_lane 5) or the equivalent binary \xfd\x1b\x05. AFAIK, VMs don't often make a distinction, but the spec does. Currently the binary format uses u8, meaning that any value in the range [0,255] is well-formed, but potentially invalid. This suggestion intends to make the well-formed ranges smaller, to better match the requirements of the instruction.

@ngzhian
Copy link
Member

ngzhian commented Sep 24, 2020

The current spec text [0] uses a single definition of "lane index" for simplicity, and it is defined as a u8.
Then later we do the actual bounds check in the validation [1].
This is simple because the definition of all the extract_lane and replace_lane can be grouped.

If we had separate laneidx2, laneidx4 etc, we first need to define each of these valid lane indices, then we will need to spell out individual extract_lane and replace_lane instructions in the syntax [0]. But then we won't need to mention the bounds check in validation at all.

I don't think either is more simple, just different presentation.

[0] https://www.ngzhian.com/simd/core/syntax/instructions.html#simd-instructions
[1] https://www.ngzhian.com/simd/core/valid/instructions.html#xref-syntax-instructions-syntax-shape-mathit-shape-mathsf-xref-syntax-instructions-syntax-instr-simd-mathsf-extract-lane-mathsf-xref-syntax-instructions-syntax-sx-mathit-sx-xref-syntax-instructions-syntax-laneidx-mathit-laneidx

@RReverser
Copy link
Member Author

I don't think either is more simple, just different presentation.

Right, it's just about moving checks earlier so that such mismatch could be reported by parsers rather than validators.

@ngzhian
Copy link
Member

ngzhian commented Sep 24, 2020

Yea agree, the point of my post was to highlight what spec changes will need to be made if we were to make this change - which isn't a lot. We can probably discuss this at the next sync #338

@abrown
Copy link
Contributor

abrown commented Oct 16, 2020

@alexcrichton, @yurydelendik: I haven't read this too closely but from the subgroup meeting I think this is forward and may affect https://github.com/bytecodealliance/wasm-tools.

@ngzhian
Copy link
Member

ngzhian commented Oct 16, 2020

From today's sync meeting, it was agreed that we can make this change:

  • engines mostly do parsing+validation in one go, so it wouldn't matter
  • tools will have to change since some (like wasmparser) only parse (i think)
  • spec changes will be done by me

If anyone foresees issues with this change, please raise it up (thanks Andrew for cc-ing).

@RReverser
Copy link
Member Author

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants