Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shared tables a validation failure #162

Open
RReverser opened this issue Jun 5, 2020 · 15 comments
Open

Make shared tables a validation failure #162

RReverser opened this issue Jun 5, 2020 · 15 comments

Comments

@RReverser
Copy link
Member

Current Spec changes in Overview list this change to the limits type, which is shared between memories and tables:

The limits type now has an additional field specifying whether the linear memory or table is shared:

limits ::= {min u32, max u32?, share}
share  ::= unshared | shared

However, all operators, JavaScript API changes and spec tests describe only what happens with the shared WebAssembly memory, and I can't find any mentions of shared tables behaviour.

I'd like to understand what are the expectations for the implementer here - is this just a spec bug and these limits types are supposed to be split with only memlimits having a shared variant, or is it something we plan to support in the future on tables too, but is expected to throw a validation error meanwhile.

@RReverser
Copy link
Member Author

Actually, I see https://webassembly.github.io/threads/binary/types.html#binary-limits actually has slightly different types/notations here. I guess Overview is simply out of date?

@rossberg
Copy link
Member

rossberg commented Jun 5, 2020

This can be factored in a number of (observably equivalent) ways, but what's in the spec draft currently assumes that the fusion of limits and shared flag is merely a detail of the binary format. Conceptually, they are rather separate things, which is what's reflected in the AST. Note that the binary grammar for table types enforces that the shared flag is 0.

@RReverser
Copy link
Member Author

@rossberg Thanks. That notation (lim,0:limits, lim,1:limits) was also a bit confusing - is it documented somewhere? I don't think I've seen this one in particular before, usually it's either literals or identifiers.

Another question: I've noticed that

Note that the binary grammar for table types enforces that the shared flag is 0.

is covered by assert_invalid tests. Shouldn't this be better be declared as a malformed module, so that it could be checked earlier at the parsing level?

@rossberg
Copy link
Member

rossberg commented Jun 5, 2020

It's a kind of pattern matching. IIRC I used it in a couple of other places in the main spec, but it may indeed not be properly introduced. It may be a bit too smart in this case.

Shouldn't this be better be declared as a malformed module

Yeah, either is a fine choice, but spec and tests should match up. The spec for this proposal still needs a lot of work.

@RReverser
Copy link
Member Author

It may be a bit too smart in this case.

Yeah I have a feeling that it would be easier to grasp if it was written down as two separate types, one generic and one memory-specific.

Yeah, either is a fine choice, but spec and tests should match up.

To clarify - assert_invalid should match only things disallowed by validation phase, not the syntax, right? (I'm still unsure about the boundary between two)

If so, I think this restriction should result in malformed:

image

@rossberg
Copy link
Member

rossberg commented Jun 8, 2020

I tend to adjusting the spec and make this a validation-time check. Seems more natural in this case.

@RReverser
Copy link
Member Author

Seems more natural in this case.

Why? Or, more generally, are the rules on what should be checked at parsing time vs validation time codified (written down) anywhere?

To me, it seems that validation should be reserved for rules that can't be checked as part of the lexing - that is cross-section checks like "data count and number of data entries should match", "indices should be valid and pointing to real items", "global.get in constant expressions should be pointing to constant globals" etc.

Anything else can be encoded as part of the grammar and checked earlier.

Most of the errors already fit this split quite well and the only two counter-examples that I'm aware of so far that don't fit such separation so far seem to be this one (shared flag shouldn't be allowed on tables + shared flag must be combined with maximum limit) and another one in SIMD for lane indices (specific range of allowed values for given instruction vs arbitrary byte).

It seems worthwhile to adjust these examples rather than moving more rules to the validation stage.

@rossberg
Copy link
Member

rossberg commented Jun 8, 2020

The question is whether something is part of the abstract syntax or not. In general, the abstract syntax tries to be faithful to what's expressible in the binary syntax, except where something can be treated as just a shorthand, or something is regarded an "encoding detail" of the binary format (like the DataCount section in the bulk instructions proposal).

But to some extend it's a choice. For example, we always had function types with multiple results in the abstract syntax, long before we actually allowed multi-values -- it was just invalid to have more than one. That seemed natural because the generalisation was on the trajectory.

In this case, it feels similarly natural because in the future we likely want to extend tables with sharability annotations as well (see our OOPSLA '19 paper).

@RReverser
Copy link
Member Author

But to some extend it's a choice.

Right, I guess that's the blurry line that sometimes makes me uneasy. It makes sense for what you said about shared tables, and it could even be extended to potentially allowing shared memories without maximum limit in the future.

But then a counter-example to such choice is SIMD lane indices, which I mentioned above, which could be said to be expressible in binary format if they're defined as bytes, or they could be said to be not expressible if we define them as u1, u2, u3 and u4 for ImmLaneIdx{2, 4, 8, 16} correspondingly.

What defines which choice is correct? Arguably they won't (can't) even be extended in the future, so indices outside the range should be malformed, yet the tests currently check that they're only invalid...

@rossberg
Copy link
Member

rossberg commented Jun 9, 2020

I haven't followed the SIMD proposal closely, but using uN might indeed make sense there. But it depends on whether there always is enough context in the binary grammar to trivially know which type is expected. If you'd need a union of u1..u4 somewhere then not distinguishing different lane types seems preferable.

@binji
Copy link
Member

binji commented Jun 9, 2020

I think you could handle it in the binary grammar, since you always know the lane type from the instruction opcode. But if we wanted to handle the syntax as we do with other instructions (grouping them by the kind of instruction they are, e.g. extract_lane, replace_lane) then we'd need to have a union at that point.

@RReverser
Copy link
Member Author

@rossberg To get back to the original issue - the resolution here is that someone will update the spec as per

I tend to adjusting the spec and make this a validation-time check. Seems more natural in this case.

correct?

@binji
Copy link
Member

binji commented Jun 15, 2020

Yes, I think that's the right change to make here.

@binji binji changed the title Meaning of shared flag on tables Make shared tables a validation failure Jun 15, 2020
@rossberg
Copy link
Member

Yep.

@RReverser
Copy link
Member Author

@binji I see you renamed the issue to "Make shared tables a validation failure", but I think the resolution was to also have the maximum on memory limits be a validation failure rather than malformed one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants