Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/compile: relax wasm32 function import signature type constraints #66984

Open
johanbrandhorst opened this issue Apr 23, 2024 · 20 comments
Labels
Milestone

Comments

@johanbrandhorst
Copy link
Member

johanbrandhorst commented Apr 23, 2024

Background

#59149 removed the package restrictions on the use of go:wasmimport, but established strict constraints on the types that can be used as input and result parameters. The motivation for this was that supporting rich types between the host and the client would require sophisticated and expensive runtime type conversions because of the mismatch between the 64 bit architecture of the client and the 32 bit architecture of the host.

With the upcoming 32 bit wasm port, this problem goes away, as both client and host will use 32 bit pointers.

Proposal

Relax the constraints on types that can be used as input and result parameters with the go:wasmimport compiler directive, on ports using the wasm32 architecture only. This currently limits this proposal to wasip1/wasm32.

The following types would be allowed as input parameters:

  • bool
  • int, uint, int8, uint8, int16, uint16, int32, uint32, int64, uint64
  • float32, float64
  • string
  • struct where all fields are allowed types Removed following discussion.
  • [...]T array where T is an allowed type Removed following discussion.
  • uintptr, unsafe.Pointer, *T where T is an allowed type
  • *struct. All fields of the *struct must be allowed types, and the struct must embed structs.HostLayout (see structs: add HostLayout "directive" type #66408). Any struct fields must also embed structs.HostLayout (recursively).
  • *[...]T where T is an allowed type.

The following types would remain disallowed:

  • chan T
  • complex64, complex128
  • func
  • interface
  • map[T]U
  • []T
  • struct
  • [...]T array

Only simple scalar types (bool, (u|)int(|8|16|32|64), float(32|64), uintptr, unsafe.Pointer) would be allowed as the result parameter type.

Discussion

Compatibility guarantees

The Go spec does not specify the struct layout and leaves it up to implementations to decide. As such, we cannot provide a guaranteed ABI without having to change the spec or force future layout changes to provide runtime conversion of data. This proposal suggests making it clear to users through documentation that there are no guarantees of compatibility across versions of the Go compiler.

Type conversion rules

The following conversion rules would be automatically applied by the compiler for the respective parameter type:

Go Type Type passed to host (per Wasm spec) Type read from host
bool i32 i32
int, uint, int8, uint8, int16, uint16, int32, uint32

int64, uint64

i32, i32, i32, i32, i32, i32, i32, i32

i64, i64

i32, i32, i32, i32, i32, i32, i32, i32

i64, i64

float32, float64 f32, f64 f32, f64
string Assigned to two call parameters as a (i32, i32) tuple of (pointer, len). N/A
uintptr, unsafe.Pointer, *T i32, i32, i32 i32, i32, N/A

Result parameters

Result parameters are more restricted since pointer values from the host cannot be managed safely by the GC, and Wasm practically does not allow more than 1 result parameter. Only basic scalar values and unsafe.Pointer are allowed as the result parameter type.

Supporting slices, maps

Both slices and maps are disallowed because of the uncertainty around the memory underlying these types and interactions with struct and array rules. Users who wish to use slices can manually use (&slice, len(slice)) or unsafe.Pointer. There is no clear way to support passing or returning map data from the host other than by using unsafe.Pointer and making assumptions about the underlying data.

Related proposals

struct.Hostlayout

#66408 proposes a way for users to request that struct layout is host compatible. Our proposal depends on the definitions put forward in this proposal for struct parameters.

go:wasmexport

The proposed relaxing of constraints would also apply to uses of go:wasmexport, as described in #65199.

Future work

WASI Preview 2 (AKA WASI 0.2)

WASI Preview 2 defines its API in terms of the Component Model, with a rich type system and an IDL language, WIT. The Component Model also defines a Canonical ABI with a specification for lifting and lowering Component Model types into and out of linear memory. This proposal does not attempt to define the ABI for any hypothetical wasip2 target, and would leave such decisions for any future wasip2 proposal.

Supporting struct and [...]T by value

A previous version of this proposal included support for passing struct and [...]T types by value by expanding each field recursively into call parameters. This was removed in favor of a simpler initial implementation but could be re-added if users require it.

Contributors

@johanbrandhorst, @evanphx, @achille-roussel, @dgryski, @ydnar

CC @cherrymui @golang/wasm

@gopherbot gopherbot added this to the Proposal milestone Apr 23, 2024
@dr2chase
Copy link
Contributor

dr2chase commented Apr 23, 2024

"This follows the C struct value semantics" is just a hair vague; are 8-byte quantities (float64, int64, uint64) stored at a 4-byte or 8-byte alignment? It was my understanding (and the purpose of #66408) to specify a 4-byte alignment for fields of those types when they occur in structs passed to wasm32 (tagged structs.HostLayout).

(edited to note error, the host alignment for 8-byte integers and floats is 8 bytes).

@ydnar
Copy link

ydnar commented Apr 23, 2024

Ideally 8-byte values would always be 8-byte aligned in the wasm32 port.

@evanphx
Copy link
Contributor

evanphx commented Apr 23, 2024

@dr2chase Looking at what clang does, it uses 8-byte alignment on 64bit quantities so we'd match that.

@dr2chase
Copy link
Contributor

dr2chase commented Apr 23, 2024

You are right, I got it backwards. But that is what you are expecting for anything that has pointers-to-it passed to the wasm host platform, yes?

@cherrymui
Copy link
Member

Thanks for the proposal! A few questions:

  • 8-byte alignment for 64-bit values, as mentioned above. cmd/compile: create GOARCH=wasm32 #63131 doesn't seem to have a definitive answer, and currently it seems the CL doesn't implement 64-bit alignment.
  • If we don't always align 64-bit value to 8 bytes (which differs from current all 32-bit architectures we support and probably requires quite some work), we should align 64-bit value to 8 bytes when structs.HostLayout is specified. So structs: add HostLayout "directive" type #66408 is very related.
  • structs and arrays. What is the ABI specification exactly? The C ABI on, say ELF AMD64, is pretty complex for passing structs and arrays. Small fields may be packed into one word. Large structs may be passed indirectly (stored on stack, passing a pointer to the callee). Do we have a specification for this?
  • string. What does a string look like on Wasm/WASI side? I couldn't find its specification on WASI P1 doc. On Component Model doc https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md (which I guess is for WASI P2, not P1?), it specifies string is two i32, which is similar to Go's string, which is good. But also it allows three encodings, UTF-8, UTF-16, and "latin1+utf16" differentiated by a high bit. The second and third encoding are not compatible with Go strings. Do we require UTF-8 encoding? Or we don't allow passing Go strings directly?

Besides, for structs, arrays of structs, and pointer to structs, I would suggest we allow only structs with structs.HostLayout to be passed. The reason is that in the Go spec we don't require struct fields to be laid out in memory in source order, and it may well change in a future Go release. structs.HostLayout specifies a fixed layout. Structs without that marker can change. This gives a clear way to say which structs should have a fixed layout, which are okay to change.

Thanks.

@dr2chase
Copy link
Contributor

Two other questions, first:

type w32thing struct {
    _ structs.HostLayout
    a uint8
    b uint16
}

Is this laid out a_bb or is it aaaabbbb? What sizes do I use for struct fields? I assume it is the smaller ones, but I wanted to verify this else it would be a problem.

Second, passing pointers to 8-byte primitive types to the host will be tricky unless those references come from fields in structures tagged with HostLayout -- otherwise, they may not be aligned. So

type wx struct {
   _ structs.HostLayout
  x int64
}
func f(x int64, w wx) {
  someWasmFunc(&x) // might not work, x might not be 8-byte aligned
  someWasmFunc(&w.x) // this will work because w is a wx and its x field is 8-byte aligned
  someOtherWasmFunc(&w) // if it used *wx for its parameter type instead of *int64
}

@johanbrandhorst
Copy link
Member Author

johanbrandhorst commented Apr 25, 2024

Thanks for the quick feedback! I've tried to answer each question:

structs and arrays. What is the ABI specification exactly? The C ABI on, say ELF AMD64, is pretty complex for passing structs and arrays. Small fields may be packed into one word. Large structs may be passed indirectly (stored on stack, passing a pointer to the callee). Do we have a specification for this?

The specification falls out of the table of transformations (I think?). There current plan isn't to introduce any sort of magic around large structs or field packing. Structs fields are added as call parameters, from the first field to the last, according to the conversion rules for the type of the field. Examples:

type foo struct {
    a int
    b string
    c [2]float32
}

With a function signature of

//go:wasmimport some_module some_function
func wasmFunc(in foo) int

Would roughly translate to (in WAT format)

// $a is of type `i32` holding the value of `a`
// $b_addr is of type `i32` and is a pointer to the start of the bytes for the Go string `b`
// $b_len is of type `i32` and is the length in bytes to read from `$b_addr` to get the whole string
// $c_0 is of type `f32` and is the value of `c[0]`
// $c_1 is of type `f32` and is the value of `c[1]`
call $some_function (local.get $a) (local.get $b_addr) (local.get $b_len) (local.get $c_0) (local.get $c_1)

Struct fields would be expanded into call parameters before subsequent fields at the same level.

What does a string look like on Wasm/WASI side?

For wasip1, we will treat Go string parameters simply as a (*byte, int) tuple. There will be no encoding constraints, just as with regular Go strings. To the Wasm host, it will look identical to using struct { a *byte; b int } as a parameter. For wasip2, those constraints would have to be considered in a hypothetical future wasip2 proposal.

Making structs.HostLayout required for structs, arrays of structs and pointers to structs

This sounds like a great idea, and we should also extend it to pointers to 8 byte sized primitive types to guarantee alignment, as suggested by @dr2chase's last question. This would avoid any question around alignment issues for pointers. It hurts the ergonomics a little bit but that's a price worth paying, I think.

type w32thing struct {
_ structs.HostLayout
a uint8
b uint16
}

Is this laid out a_bb or is it aaaabbbb? What sizes do I use for struct fields? I assume it is the smaller ones, but I wanted to verify this else it would be a problem.

I'm a little confused by the question to be honest. If this type was used as an input to a Wasm call, it would look like this:

// $a is of type `i32`
// $b is of type `i32`
call $some_function (local.get $a) (local.get $b)

I suppose that might mean the memory looks like this: a___bb__? We're not passing a pointer to the struct or the fields, so we'd need to copy the values into locals, which will be of type i32 (I think)? Admittedly my grasp of this exact part of the code is a bit weak so I appreciate corrections.

@cherrymui
Copy link
Member

Thanks for the response!

Structs fields are added as call parameters, from the first field to the last, according to the conversion rules for the type of the field.

This sounds like a reasonable choice. Is this ABI specified anywhere in Wasm/WASI docs? Or the Wasm side has to define the function taking parameters element-wise?

For wasip1, we will treat Go string parameters simply as a (*byte, int) tuple. There will be no encoding constraints, just as with regular Go strings. To the Wasm host, it will look identical to using struct { a *byte; b int } as a parameter.

This sounds reasonable as well. Is it specified anywhere in Wasm/WASI docs?

Thanks.

@johanbrandhorst
Copy link
Member Author

This sounds like a reasonable choice. Is this ABI specified anywhere in Wasm/WASI docs? Or the Wasm side has to define the function taking parameters element-wise?

I don't know about this being an official ABI so much as just a consequence of the Wasm spec around function calls and how we can apply Go semantics to it. We're limited to the i32, i64, f32 and f64 value types, and the call instruction takes a function index and arguments from the stack. In order to simulate pass-by-value for structs, we have to flatten each field to one of the allowed value types.

This sounds reasonable as well. Is it specified anywhere in Wasm/WASI docs?

Not sure there's a doc anywhere, but practically, definitions like path_create_directory, which take a string parameter, use this pattern: https://cs.opensource.google/go/go/+/refs/tags/go1.22.2:src/syscall/fs_wasip1.go;l=230.

ydnar added a commit to ydnar/tinygo that referenced this issue May 4, 2024
@dr2chase
Copy link
Contributor

dr2chase commented May 6, 2024

I guess my question is whether a pointer-to-struct is ever passed from Go to the WASM platform, and therefore, what expectations the WASM side has about the layout of the fields of that structure. structs.HostLayout is intended to obtain those expectations, but (1) do we even need to do this? We thought we did, and (2) we need to know what the expectations are. I think it was just that 64-bit floats and ints get 64-bit alignment.

I don't think this is for specifying the layout that gets passed to a WASM call if the struct is passed by value.

@cherrymui
Copy link
Member

I don't know about this being an official ABI so much as just a consequence of the Wasm spec around function calls and how we can apply Go semantics to it. We're limited to the i32, i64, f32 and f64 value types, and the call instruction takes a function index and arguments from the stack. In order to simulate pass-by-value for structs, we have to flatten each field to one of the allowed value types.

As the ABI doesn't have a way to pass struct by value, do we need to support it? If users on the other (non-Go) side have to define the function as taking arguments element-wise with primitive types and pointers, it would probably be better to define the same way on the Go side. Does any other language have a Wasm/WASI interface that allows passing struct by value?

(Same applies for arrays. Pointer to struct/array is fine.)

@johanbrandhorst
Copy link
Member Author

I guess my question is whether a pointer-to-struct is ever passed from Go to the WASM platform, and therefore, what expectations the WASM side has about the layout of the fields of that structure.

I think the biggest concern around this is that all 64 bit values use 8 byte alignment, as you say. We definitely want this, so I think that on its own makes the case for structs.HostLayout. For other values, I think we want to just use "natural alignment" (4 byte for 4 byte values, etc). As far as we know, there is no strict enforcement of this in Wasm generally, but this is the approach taken by LLVM, so it probably makes sense for us to keep it the same.

I also don't know that it's an important question for this proposal in particular, since the answer is pretty clear regarding what we should be passing in the call instruction when encountering a pointer (an i32). I'm happy to weigh in on #66408 if needed to have this discussion though.

As the ABI doesn't have a way to pass struct by value, do we need to support it?

It's fair to say that we can just not support structs and arrays by value, their use are likely to be limited (why not use a pointer?), and it would significantly simplify the implementation. We can come back to it if we need to later. I'll update the proposal.

@aykevl
Copy link

aykevl commented May 13, 2024

On the TinyGo side we're working on an implementation of this proposal, so here's my perspective on it from TinyGo:

  • Always passing structs by reference (pointer) seems like a good idea. Passing structs by value gets complicated quickly and if we agree on something in the future it's easy enough to add it in a new proposal. Structs in memory are relatively well defined in comparison.
  • Strings in TinyGo are already of the format specified in this proposal, so that's nice. We won't need to do anything special there.
  • TinyGo uses the host layout (it's based on LLVM) so TinyGo needs no struct.HostLayout pragma. But it would be relatively simple to add a check like that to ensure compatibility, and if that is what the proposal ends up with I'll make sure we will have the same strict checking.
  • TinyGo has always had a 32-bit wasm implementation (int, uintptr and pointers are 32-bit). Therefore, it would make sense to allow these values at all times. That's a possible compatibility concern, but in essence we're already incompatible so I'm not sure how much of an issue this is. Thoughts?

Question: what fields would be allowed in these structs? I would assume a struct with a chan field would be disallowed, for example. This isn't part of the proposal yet though, so perhaps this can be added?
Something like this:

Structs may not be passed by value, but pointers to structs are allowed. Every field in a struct must be one of the allowed parameter types, or be a struct (recursively).

@cherrymui
Copy link
Member

TinyGo has always had a 32-bit wasm implementation (int, uintptr and pointers are 32-bit). Therefore, it would make sense to allow these values at all times.

I think this is fine. And we should allow them in Go gc toolchain for the "wasm32" port.

Structs may not be passed by value, but pointers to structs are allowed. Every field in a struct must be one of the allowed parameter types, or be a struct (recursively).

Yeah, something along this line makes sense. And also for arrays. I'd say a struct field or a struct pointed by a field should also have the HostLayout marker (because the marker is not recursive).

@johanbrandhorst
Copy link
Member Author

Thanks for your thoughts Ayke, it's always appreciated.

TinyGo has always had a 32-bit wasm implementation (int, uintptr and pointers are 32-bit). Therefore, it would make sense to allow these values at all times. That's a possible compatibility concern, but in essence we're already incompatible so I'm not sure how much of an issue this is. Thoughts?

As Cherry says, these values will be allowed since this proposal is restricted to the wasm32 architecture. The wasm architecture will not have these new relaxations applied. I'm not sure I understand the incompatibility?

Structs may not be passed by value, but pointers to structs are allowed. Every field in a struct must be one of the allowed parameter types, or be a struct (recursively).

I'd say a struct field or a struct pointed by a field should also have the HostLayout marker (because the marker is not recursive).

I've added some clarifying words to the proposal, please take a look!

@aykevl
Copy link

aykevl commented May 17, 2024

The updated proposal looks good to me!
(If I'm very pedantic, it doesn't explicitly say that a struct in a struct is allowed, though it clearly should be. Right now it says *struct is allowed but struct isn't).

However, I have to say that @ydnar has pointed out that the Canonical ABI also allows structs, and it would be nice to have them supported in //go:wasmimport. That said, if I'm reading the specs correctly, the Canonical ABI and the C ABI are incompatible when it comes to structs: the C ABI passes structs by value only when it contains only one field after flattening, while the Canonical ABI passes records (similar to structs) by value if the number of fields is 16 or less after flattening. So that means //go:wasmimport would have to choose between the C ABI and the Canonical ABI.

As Cherry says, these values will be allowed since this proposal is restricted to the wasm32 architecture. The wasm architecture will not have these new relaxations applied. I'm not sure I understand the incompatibility?

Nevermind, TinyGo doesn't even support GOOS=js GOARCH=wasm tinygo ..., it just uses tinygo -target=wasm. So in essence tinygo -target=wasm ... is equivalent to GOOS=js GOARCH=wasm32 go .... Basically it has always been a GOOS=wasm32 implementation and never supported what would be GOOS=wasm (with 64-bit pointers).

I'd say a struct field or a struct pointed by a field should also have the HostLayout marker (because the marker is not recursive).

Seems like a good idea. It's easier to remove such a restriction in the future (if it turns out to be unnecessary) than it is to introduce it later. But I don't know Go internals well enough to say it is needed.

@johanbrandhorst
Copy link
Member Author

To comment quickly on the Canonical ABI: it doesn't relate to this proposal directly as this proposal only targets the wasip1 port, and the Canonical ABI is a preview 2 document (as far as I know). A hypothetical wasip2 proposal would have to tackle type constraints for go:wasmimport (and go:wasmexport) as they relate to the Canonical ABI.

@cherrymui
Copy link
Member

I'm okay with supprting passing structs by value if there is a widely used ABI that is not too complex (if it is as complex as the ELF C ABI on amd64, I'm not sure). If currently there is no widely agreed ABI for structs, we can wait. We can always add things later.

@ydnar
Copy link

ydnar commented May 18, 2024

Hi, original author of the relaxed type constraints proposed here.

The TinyGo PR where this originated depends on LLVM to flatten structs and arrays. This works in practice most of the time, except when it doesn't: namely the Component Model and WASI 0.2 extensively uses tagged unions (variant types in WIT).

The code generator (wit-bindgen-go) implements the flattening rules as specified in the Canonical ABI, which then leans on LLVM to flatten the Go structs that represent variant types.

The CABI flattening rules are per-field, so if a variant has a case that includes a 64-bit wide field, then the flattened representation of the variant must use an i64 at that position.

Given that the compiler is ignorant of the CABI layout, this strategy cannot correctly represent these variant types when passed by value.

@cherrymui: LLVM does correctly flatten structs and arrays consistent with the CABI spec (my sense is the former informed the latter). If we want to start with a more constrained set of types now and relax later, we can make that work.

@aykevl
Copy link

aykevl commented May 20, 2024

LLVM does correctly flatten structs and arrays consistent with the CABI spec (my sense is the former informed the latter).

Not exactly. If you pass a LLVM struct like {i32, i32} by value, LLVM will happily flatten the struct and pass it as values. But if you do that in C, Clang will pass the struct by reference, not by value: it will reserve some space on the stack and pass a pointer instead.
See: https://godbolt.org/z/YjKj5o3c4

I believe this is why the Component Model lowers everything to bare i32/i64/f32/f64/pointer values in function signatures, which have no ambiguity in what ABI they should have on the WebAssembly level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

7 participants