Post-MVP host mappings for GC objects #496

ericvergnaud · 2023-12-16T13:17:44Z

Reading the discussions re mappings between GC objects and host objects, I see that they focus mostly on structs.

I'm wondering if it would make sense to treat arrays separately. The reasons for that are:

there could be a significant performance benefit in directly mapping WebAssembly arrays, thanks to 0-copy.
they could be used as an intermediate 0-copy solution for strings until the Strings proposal is reprioritized and delivered.
unlike structs, the meaning of arrays doesn't vary significantly across programming languages. It could help reach consensus on the spec more rapidly for arrays than for structs, and see a MVP soon after.
it would free the structs mapping spec from array related constraints.

As an example, an i32 unpacked array would map to:

Int32Array in JS
int[] in Java
int[] in C#
...
similarly, an i32 array packed using Int8 would map to:
Int8Array in JS
byte[] in Java
sbyte[] in C#
...

Would it make sense to submit a draft Post-MVP PR focused on array mapping ?

rossberg · 2023-12-16T14:36:24Z

You won't usually be able to map Wasm arrays to language arrays directly, as they are more low-level. Most languages have (different) extra features that Wasm arrays do not have, such as built-in hash ids, possibly extra fields or methods, other typing rules, compatibility with other language-specific types. Some also have fewer, like C or Rust. There is no magic high-level interop between an assembly-level language and a higher-level host language.

ericvergnaud · 2023-12-16T15:17:28Z

Thanks for the feedback, and I agree with your comment that

There is no magic high-level interop between an assembly-level language and a higher-level host language

My thinking re 0-copy is that the actual data pointed to by wasm arrays should not be different from the data pointed to by host language arrays i.e. a contiguous array of bytes of size greater or equal than n_elements * sizeof<array_element_type>. Is that incorrect ?

Moreover, since the host is responsible for both the wasm array implementation and the host array implementation, it doesn't seem unfair to expect the host to deal with the mapping between them, and directly access the underlying data instead of copying it. Unlike with strings, we're not looking for cross-language interoperability (or should I say, wasm arrays interoperability is already specified by the GC spec).

As a concrete example, I wouldn't be shocked if a JS engine, when asked to create a wasm i32 array, actually instantiated an Int32Array and wrapped it into a V<HeapObject>. Or conversely, they could introduce an Int32ArrayView that wraps the V<HeapObject> and is seen from JS as a regular Int32Array.

So there wouldn't be a direct mapping of Wasm arrays to host language arrays, but the cost of calling ToJSValue/FromJSValue for arrays would be minimal and its complexity would be O¹ rather Oⁿ.

rossberg · 2023-12-16T15:56:36Z

The cost of ToJSValue and back is always O(1). But you're typically not getting a value within the host language's ordinary set of types.

I vaguely remember previous discussions about mapping Wasm arrays to typed arrays in JS. IIRC it wasn't obvious how to do that without performance penalties and semantic issues, for example, because multiple typed arrays can share the same array buffer in JS, which makes no sense from the Wasm perspective and breaks Wasm semantics. At best, Wasm arrays correspond to JS array buffers, but then there is the whole mess with detached buffers etc.

ericvergnaud · 2023-12-16T17:54:31Z

Let me rephrase this:

...the cost of calling ToJSValue/FromJSValue for arrays would be minimal and its complexity would be O¹ rather Oⁿ.
to:
...the cost of converting an array back and forth between WebAssembly and a usable value in the host language would be minimal and its complexity would be O¹ rather Oⁿ.

I'm not sure that 'Wasm semantics would be broken by JS typed arrays' (which are backed by array buffers). Rather since JS allows the developer to treat the underlying data the way they wish, that potentially breaks the data, not the semantics imho i.e. if a developer creates an Int32Array from an ArrayBuffer that contains Uint8s, then writing i32s[0] = -1 will break the ui8s at 0,1,2,3. But it doesn't break the ability of the Uint8Array to know its length and provide read/write access to elements, which are still valid uint8s, all equal to 255. Do you have a specific example in mind where semantics would be broken ?

Forbidding detached buffers on JS arrays shared with WebAssembly sound like a very acceptable limitation imho, especially given the benefits i.e. performance, and being able to read the array data without copying it, and if mutable, resize it and write to it using host language syntax.

jakobkummerow · 2023-12-16T17:56:52Z

Implementing Wasm i32 arrays as JS Int32Arrays under the hood would be possible, I think, but it would be quite a bit slower than our (V8's) current strategy of implementing Wasm arrays differently (with much less overhead). JS TypedArrays are surprisingly heavyweight in terms of both memory overhead and access performance. (Yes they're fast, but WasmGC arrays are faster.)

As long as Int32Arrays and Wasm i32 arrays are implemented as different in-memory object layouts (for the benefit of the latter!), there isn't going to be any zero-copy conversion between them. Making them interchangeable would (at best!) mean settling for the slower of the two designs -- and at worst make that even slower, if it then needs to distinguish more internal cases.

That said, for single-element access, the status quo is totally fine: when you export an array getter function (func $get_i32 (param $array (ref null $type_i32array)) (param $index i32) (result i32) (local.get 0) (local.get 1) (array.get $type_i32array)) from your Wasm module, and that gets called from a place that's sufficiently hot to get optimized, V8 will inline this Wasm getter function into optimized code for the calling JS function, and the resulting performance should be at least as good as an Int32Array access (because type check and bounds check are about the same, but for the Wasm array we never need to check whether the ArrayBuffer has been detached).

For convenience, it would be nice to support a richer syntax for JS/Wasm interaction, but for performance, it isn't necessary.

rossberg · 2023-12-16T18:10:02Z

I'm not sure that 'Wasm semantics would be broken by JS typed arrays' (which are backed by array buffers).

It's the fact that two typed arrays with different identity can still alias each other, which is observable through mutation, but not allowed by the Wasm array semantics.

if a developer creates an Int32Array from an ArrayBuffer that contains Uint8s, then writing i32s[0] = -1 will break the ui8s at 0,1,2,3.

I'm not sure what you mean by that, as far as Wasm is concerned, it's just bits being written, and every bit pattern is legal under either type.

ericvergnaud · 2023-12-16T22:44:09Z

@jakobkummerow thanks for the insights, and great to hear that (at least in javascript/v8) there would be less performance impact using single-element access than there would be in converting wasm arrays to host language ones.

That said, I tend to also value convenience. Afaiu, in the current state of the spec, every provider of a wasm that wants to give access to a wasm array would have to create and export those functions, for the consumers of that wasm to call them. These consumers would have to know which specific methods to call in each specific wasm ? Whereas if these were made available as part of the GC spec (and if the host language implements them!), then that problem would go away.

Consider for example the following Typescript code:

class WasmArrayHandler implements ProxyHandler<object> {

    static isValidArrayIndex(key: string) {
        try {
            const index = parseInt(key);
            return index >= 0;
        } catch {
            return false;
        }
    }

    static isValidArrayKey(key: string) {
        return key=="length" || WasmArrayHandler.isValidArrayIndex(key);
    }

    get(target: object, key: string): any {
        return WasmArrayHandler.isValidArrayKey(key) ? WebAssembly.get(target, key) : undefined;
    }
}

abstract class ArrayProxy {

    static of<T>(target: object): T[] {
        return new Proxy(target, new WasmArrayHandler()) as T[];
    }

}

const wasm_array = someFunctionReturningAWasmI32Array();
const proxied = ArrayProxy.of(wasm_array);

const value = proxied.length;
const item = proxied[2];
proxied[0] = 27;

For that code to work, it could indeed rely on the specific wasm to export an array.getter function.
But to work for any array in any wasm, all it would need is a WebAssembly.get(target, key) function, which itself requires Get and Set to be implemented for opaque wasm arrays (and if they were there would be no need for a proxy).

This may smell like syntactic sugar. It's not. It's about standardizing access from the host language to wasm array elements and properties, which is a much smaller problem than doing so for structs, and as such might deserve a dedicated discussion.

rossberg · 2023-12-17T11:32:15Z

Just to clarify, the intent is that the Wasm JS API will eventually be extended with classes and functions that give full direct access to Wasm GC objects. That is possible without turning Wasm arrays into JS typed arrays. The main reason we deferred the API were the many open questions, things like handling JS prototypes are notoriously sensitive, and doing them wrong might harm coherence, JS usability, or Wasm performance.

ericvergnaud · 2023-12-17T14:23:12Z

I think that "philosophically" you only want to exchange data, not behavior. Java Records rather than Class instances (I appreciate this distinction is not available in many languages).
That said I suspect the array is a much simpler sub-problem and thus could be addressed more rapidly, hence this discussion proposal. It could be a phase 1 of the overall spec.

rossberg added the Post-MVP Ideas for Post-MVP extensions label Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-MVP host mappings for GC objects #496

Post-MVP host mappings for GC objects #496

ericvergnaud commented Dec 16, 2023

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023 •

edited

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023

jakobkummerow commented Dec 16, 2023

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023 •

edited

rossberg commented Dec 17, 2023

ericvergnaud commented Dec 17, 2023

Post-MVP host mappings for GC objects #496

Post-MVP host mappings for GC objects #496

Comments

ericvergnaud commented Dec 16, 2023

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023 • edited

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023

jakobkummerow commented Dec 16, 2023

rossberg commented Dec 16, 2023

ericvergnaud commented Dec 16, 2023 • edited

rossberg commented Dec 17, 2023

ericvergnaud commented Dec 17, 2023

ericvergnaud commented Dec 16, 2023 •

edited

ericvergnaud commented Dec 16, 2023 •

edited