Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thought experiment: JS subset using GC MVP #113

Open
bvibber opened this issue Aug 10, 2020 · 5 comments
Open

Thought experiment: JS subset using GC MVP #113

bvibber opened this issue Aug 10, 2020 · 5 comments

Comments

@bvibber
Copy link

bvibber commented Aug 10, 2020

Thought experiment: JS subset using GC MVP

"One complete but limping implementation of an important language is worth 20 long threads on subtyping"
-- @titzer, #111 (comment)

Well you nerd-sniped me. :) Here's a rough sketch of what a JS compiler targeting WebAssembly with the GC MVP proposal might look like. I don't currently plan to implement it as-is but if people would be interested in working together on one, let me know! This plan is subject to further editing of details based on discussion, and is meant to prompt discussion about possible changes to the GC proposal based on what an implementation would need. -- @Brion

Goals:

  • promote conversation about details of Wasm GC MVP proposal (and its alternatives) in practice for implementing a dynamic language
  • make rough plan for a core subset of JavaScript features: primitives, objects, arrays, strings, closures
  • target ahead-of-time compilation with moderate performance
  • stay open to opportunities for JIT compilation... (requires host imports to generate and link new modules)

TL;DR version:

  • JS "any" value type can use anyref
  • numbers use mix of int31ref and boxed structs
  • other JS values are structs with a common prefix (type discriminant and hash code)
  • objects can be represented with a shape reference and an array<anyref> of field values
    • fast-path field access will be handled through caching shape-chain lookups
    • very large objects will want an escape hatch to hash maps
  • Arrays, BigInt, strings and symbols seem fairly straightforward
  • TypedArrays are harder than expected; there's no fast way to load or store an f64 in an array<i8>
  • closures combine a low-level function reference with a struct of closure state
    • function signatures must all match, so a fixed number of args are passed directly and any additional ones are allocated as array<anyref>
  • implementing JS Map inside Wasm looks possible, using lazy-initialized hash codes in JSValue header
  • implementing JS WeakMap in pure Wasm looks impossible?
  • passing any JS value other than an i31ref outside of the module can lead to data corruption by foreign code -- safe ffi would need some kind of wrapper that can hide internal data

Recommendations:

  • add the necessary support to create weak maps inside the Wasm sandbox (weak refs?)
  • weak refs are also needed to free shape structures when no longer used
  • add some kind of data-hiding for FFI boundaries (either private fields that can only be read by callers with a matching key reference, or support for creating a weak map to store private data in a separate struct from the public wrapper struct)
  • add a buffer type that's distinct from array<i8>, allowing memory-like loads and stores of various widths and types but without the overhead of creating an entire memory and all the virtual memory shenanigans that entails
  • add the ability to tack a runtime-sized array or buffer onto the end of a struct; this would remove an indirection when accessing data in strings, BigInts, and ArrayBuffers, and prevent a second allocation
    • note that Object fields and Array contents are growable, so still require a separate allocation

Value representation

A universal representation ("variant" or "any" type) is required when handling JS values of unknown or variable type.

Given the absence of a variant type in Wasm that can represent double-precision floating point values, the current solution is to use anyref as the variant representation, with integers in i31 range stored as i31ref and larger numbers boxed into structs.

An important part of JS compiled code will be unpacking runtime type information to determine the runtime type of a value from an anyref if it couldn't be statically verified. An important consideration is that GC MVP proposal's structural subtyping will consider two struct types equal if they are laid out the same. So if the struct or array representation is the same between two different types, they must be distinguished by adding a field with a discriminator.

Types that encounter this problem include boxed types like boolean and i32 which have identical machine representations for their payloads.

Undefined

The undefined special value is stored as a null reference, giving "free" initialization of newly allocated arrays to contain our undefined value.

Compiled code can test for undefined with ref.is_null.

JSValue base struct

All other values (except i31ref-optimized numbers) will be represented via structs, all of which start with an i32 field with a type discriminator enum and another for a hash value.

JSValue:

  • type_id: i32 enum (Object, Array, TypedArray, null, string, symbol, boolean, deleted, i32, f64, BigInt)
  • hash: i32

The type_id allows quick discrimination over a large number of possible types and dispatch to various code paths with a switch (eg, via br_table) or more targeted checks via a compare and branch. The only rtt checks needed are for i31ref and the JSValue struct base, then once a type id is known the correct full structure can be cast to in the specific code path.

This enum's values may be tailored to making it easy to check for specific subclasses of behavior based on bitfields or range comparisons.

The stable hash value can be used to put any JS object into hash maps, such as implementing the JS Map class or in internal structures. Implementation of hash map internals is left as an exercise for later.

However, generating a hash code for every boxed number, or large string, that's never used in hash key lookups would be wasteful. The hash value should be lazy-initialized, either using 0 as an uninitialized sigil (requires modifying any hash value of 0 to another value) or by storing a second field with an initialization check (which would be slower).

For objects where the hash is based on identity, the hash computation can start with some sort of created instance counter and calculate a hash from that, or the equivalent -- the important thing is once assigned it must never change. And for types that implement hash equality based on content (BigInt, string, boxed numbers) this will instead use the same value for every instance with the same content.

Booleans, null, deleted

A boxed boolean struct contains its numeric value (0 or 1) as a field for easy use via dereference. Normally globally unique values can be used for boolean values as well, avoiding the cost of allocating new ones on the heap at runtime.

BoxBoolean(JSValue):

  • value: i32 (0 or 1)

Similarly the null value carries a nice 0 numeric value in some operations, so it's using what's really the same struct type:

BoxNull(JSValue):

  • value: i32 (always 0)

And again it will usually be a globally unique instance.

A sigil value is also needed for deleted entries in arrays, as distinct from present undefined entries:

BoxDeleted(JSValue);

  • value: i32 (always 0)

This is changed out for an undefined value when read, but needs to be skipped over in iteration etc. To make array lookups efficient this comparison should be done against a unique reference value if possible.

These will all have hash keys that should be distinct from their integer values, but need to be stably based on their types & their contents if new instances ever get created, as all instances of these types must act identically including as Map keys.

Numbers

Integers known to be in the i31 range can be packed in an i31ref, however this must be either statically verified during compilation or dynamically checked at runtime. Larger values can be boxed:

Boxi32(JSValue):

  • value: i32

and floating-point values, or overflow after 32-bit arithmetic, can be boxed as well:

Boxf64(JSValue):

  • value: f64

Heap boxing is not free, and can have a significant cost if there's a lot of churnover of bit fiddling or floating-point math that exercises the boxing a lot.

Both numeric box types must use a content-based hash value, as values of different provenance must produce stable hashmap keys.

Calculating this hash value on every boxing operation would be a significant expense, especially when most numbers will never be used as hash keys, so it should be deferred and calculated on demand.

BigInts

BigInts could be stored something like this:

BigInt(JSValue):

  • nbits: i32
  • lower: i64
  • rest: ref null array<i64>

Values that fit in 64 bits would not require a separate allocation, while larger values would store additional bits in 64-bit chunks in an array.

There may be better storage arrangements than this for arbitrary-width integers, but this should start.

The JSValue hash key must be content-based, and as with strings and boxed numbers should be lazy-initialized.

Strings

String contents can be represented as an array<i16> containing UTF-16 code units. We get dynamic allocation, GC, and a queryable length for free! But it is a second allocation on top of the struct with the type code and hash value.

It's also important to note that there's no restriction against modifying a string stored this way, even though JavaScript strings are immutable. Bugs in the compiler, or hostile code on the other side of a module boundary, could modify a string and invalidate assumptions held about past string equality.

String(JSValue):

  • chars: ref array<i16>

The JSValue hash key field must be stable between multiple instances of identical strings, so it must be calculated from the string contents. This will be lazy-initialized when needed in a hashmap operation.

Symbols

A symbol has a name which is a string, but the name isn't unique while the symbol instance is. Fancy! Simplest thing is to have a JSValue struct that stores a reference to another string with the name:

Symbol(JSValue)

  • name: ref String

Symbols must use an identity-based hash key, as they are unique unlike strings.

Objects

The Object struct itself holds some JS internal fields like the prototype, a reference to a Shape describing the property names and configuration, and an array of anyrefs that carry the actual property values:

Object(JSValue):

  • proto: anyref
  • shape: ref Shape
  • fields: ref array<anyref>

(Don't forget the JSValue struct's type_id discriminator can tell us if we're one of the weird Object-based subtypes like Array.)

The fields array can be larger than the number of fields actually specified by in the Shape, to reduce the need to copy existing data when adding new fields.

What we'd like to do is optimize most property gets and sets into array loads and stores, using a cached lookup for the field index. If a function is regularly given the same type of objects, we need only confirm that the Shape matches the last-used value and we can use a previously-looked up index safely. If not, look it up through the Shape records and save it alongside the new shape reference for next time.

This means when we add a field in a given way at runtime, we must guarantee we will get the same new shape record every time. This means Shapes have both backward- and forward-facing references:

Shape:

  • field_count: i32
  • descriptors: ref array<ref Field>
  • parent: ref null Shape
  • successors: ref array<ref Shape> -- ideally should be array<weakref Shape> but that doesn't exist

field_count lists the total number of fields, which isn't necessarily one plus the offset of the last descriptor.

The descriptors contain Field struct refs, listed in more detail later, with the name/getter/setter/etc. These only list the fields changed in this version of the Shape. Later descriptors override earlier descriptors.

The parent chain can be followed to fill out all the fields back to the empty object or whatever the root of this shape chain is.

The successors list on the other hand lists all successor shapes to a given shape that have been made, eg by adding a field or changing a property descriptor. We can iterate through these looking for a shape that makes the change we're looking at to look up an existing shape. Again, for lookups we can cache the last-used parent & successor shape pair to avoid iterating through the entire list on subsequent code runs: having made the check once is enough to let us cache the new shape.

(That can be a fancier data structure than a simple array, if that helps to optimize lookups.)

Note that because weak refs don't exist in the GC MVP proposal, the current plan is to accept there can be memory leaks in the shape successors graph and the inline caches. If an object is allocated in a certain way, then all instances of it are freed, the objects will be freed but the shape itself will not be freed unless the root of the shape graph is freed and all cache sites are fed with different objects. For objects built up from {} this will never happen.

The field info includes the name, which can be either a String or Symbol, getter/setter refs, the actual field array offset, and some boolean fields for property configuration. Changing properties is done by replacing the entire Field structure, which should be considered immutable.

Field:

  • name: ref JSValue
  • getter: ref null Function
  • setter: ref null Function
  • offset: i32
  • writable: i8
  • configurable: i8
  • enumerable: i8

Arrays

JavaScript Arrays are a special case of Object that can implement fast "dense" arrays of properties using 32-bit integer indexes or their string equivalents, but can also be extended, used, or abused, in various interesting ways that may be surprising at runtime.

Plan here is to implement Array as a subtype of Object which includes a std::vector-like backing array ref, length & capacity, and runs anything else through the Object properties.

Array:

  • subtype(Object)
  • buffer: ref array<anyref>
  • length: i32

The usual vector implementation applies -- when adding items overflows the capacity, allocate a new backing buffer and copy data in. Capacity doesn't have to be stored here, as it's queryable from the array.

Deleted items are stored in-place as a singleton reference value.

TypedArrays

JS TypedArrays are tricky in that views of different types can be overlaid on the same backing ArrayBuffer. However GC MVP's array types provide for no such aliasing; an array<i8> can only be read/written a byte at a time, and an array<i32> can only be read/written at 32-bit granularity, etc.

Thus while ArrayBuffer is straightforward to implement, most likely you'd end up with a fast Int8Array and Uint8Array and very slow Int16Array and larger or Float32Array and larger, because they'd have to access the buffer a byte at a time and then combine the results.

ArrayBuffer(Object):

  • bytes: ref null array<i8>

The nullable pointer is required because array buffers can be 'neutered' via transfer to another context, as seen on the web with postMessage. This could be used to implement data transfer to an outside host with minimal copying.

TypedArray(Object):

  • buffer: ref<ArrayBuffer>
  • element_type: i32 enum { Int8, Uint8, Uint8clamped, Int16, Uint16, Int32, Uint32, Int64, Uint64, Float32, Float64 }
  • element_size: i32 (1, 2, 4, or 8)
  • offset: i32
  • length: i32

Functions and closures

JS Function objects are Objects but also must wrap some closure state (captured variables) and an internal function reference. This should be pretty straightforward:

Function:

  • subtype(Object)
  • closure: anyref
  • func: ref func<...>

The main question is what this implies for the signature of the target function. Unless we have multiple variants of Function that store different numbers of args, we're probably going to have to do something simple like creating a common ABI signature for all JS functions:

(func
    (param $closure anyref)
    (param $thisarg anyref)
    (param $argc i32)
    (param $arg1 anyref)
    (param $arg2 anyref)
    (param $arg3 anyref)
    (param $arg4 anyref)
    (param $rest ref null array<anyref>)
    (result anyref)
)

The $argc parameter passes the number of actual args passed, and any arg not used by the caller is initialized to undefined per the ABI so it can be used directly by compiled code. Any additional arguments are passed via an optional array. The exact number of optimized arguments may need to be tuned.

The additional args can then be copied or read directly for use of arguments and friends, which may require special compiler support when used.

A closure can use a function-specific struct type, so if values are guaranteed by static analysis to be a certain type (eg, i32 or f64) they can be stored that way, avoiding a heap boxing for floating point. However anything that can't be statically determined must be an anyref.

If doing one-pass AOT compilation, local variables can live in Wasm locals, with any declared arguments beyond the ones provided copied into place from the rest array. The full JS arguments object behavior can either be forbidden for simplicity, or special-cased in slow-path code for functions using it.

Note that if doing an optimizing compiler that has points where it can escape from one tier of compilation to another, suitable interfaces would be necessary to transfer arguments and variable state to the other version of the function. This is complex, requiring functions compiled specifically to exit or enter at certain points, and is outside the scope of this current proposal.

Identity-related features (Maps, WeakMaps)

JS Map allows any JS value as a key, including both object references and primitives such as strings, numbers, null...

I see two implementation strategies:

  • implement a hash map in Wasm using GC-based structs, using the hash value in the JSValue structs
  • -or-
  • require the host glue code to implement it, such as using the host JS implementation's Map as a backing

Both seem plausible; note you can't use the object reference itself as a low-level hash because it can't be converted to a number.

As far as I can tell there is no way to implement JS WeakMap using the GC primitives, leading us back to:

  • require the host glue code to implement it, such as using the host JS implementation's WeakMap as a backing

Without using host imports, I don't think a WeakMap can be implemented currently.

Proxies

JS Proxy objects are a special type of object that forwards a bunch of operations to another object, or (at the Proxy creator's option) performs alternate code. The slow-path operations are relatively straightforward, they're just an extra layer of indirection, and should not create any huge surprises I think.

Security and module isolation

Since JS values are stored in anyref they can be transmitted to and from other modules over import/export boundaries or function references that have been sent in.

As far as I can tell there is nothing to prevent the other side of the module boundary from modifying the internal state of the struct and array objects, such as strings and JS objects and their internal data structures, such as object shape definitions.

Thus any form of security encapsulation inside the JS virtual machine implemented in this module must be considered NULL AND VOID in the face of transfer of references to an untrusted module.

Examples

Consider the following program:

class Complex {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    abs() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    add(other) {
        this.x += other.x;
        this.y += other.y;
    }

    mul(other) {
        let newX = this.x * other.x - this.y * other.y;
        let newY = this.x * other.y + this.y * other.x;
        this.x = newX;
        this.y = newY;
    }
}

// z(n+1) = z(n)^2 + c
function iterate_mandelbrot(c, maxIters) {
    let z = new Complex(0, 0);
    for (let i = 0; i < maxIters; i++) {
        if (z.abs() >= 2) {
            return i;
        }
        z.mul(z);
        z.add(c);
    }
    return maxIters;
}

This code, when called over a suitable iteration area of the Mandelbrot set performs reasonably well in V8, comparably to code that uses pairs of variables and inlines all the computation manually.

Do not expect V8-level performance for this program in our hypothetical VM. :(

Several notes here. First, remember that numbers outside the i31 range must be boxed into heap objects. This will be true of the x and y properties of the Complex c and z objects, so accessing them requires (at best case) another indirection beyond just looking into the field list. And updating the value similarly requires allocating a new boxed struct, as the old one could have been copied elsewhere and be in use. (We can't easily create new types at runtime, so it's not possible to store doubles directly without duplicate storage arrays.)

Second, no static annotations indicate the types of any of the arguments, so c and maxIters must be checked... additionally, because JavaScript allows calling functions with arbitrary this arguments there's no guarantee that this will be a specific type in method bodies! Thus every method must have a guard check for the this type in front of any attempts to use the Complex-specific object shape, with a suitable slow-path backup looking up properties or checking for expected types.

It should though be possible to inline the Complex class's abs, mul, and add methods by using static analysis with a check that the Complex type had not changed unexpectedly at runtime. This might involve a runtime check when unwrapping the Function object of a method property. Fast path on confirming the method pointed to the expected function would run the inlined code (which now can in fact assume that z is a Complex and that it has a particular shape) with a slow-path that calls the actual function with the actual values.

The field indexes on c (which cannot be statically determined to be a Complex) could be cached inline, but the check for objectness must be performed first and the shape confirmed against the cache before use.

Consider the following code:

function create_packet(ts, track, data, is_keyframe) {
    let packet = {
        ts,
        track,
        data
    };
    if (is_keyframe) {
        packet.keyframe = true;
    }
    return packet;
}

This produces different object shapes depending on whether an argument is set, eek! A loop handling many packets could end up having to re-perform field index lookups when changing shapes during the stream processing. If shapes change a lot, this could lead to some thrashing.

We could be slightly cleverer about the shape comparisons, where if there's not an exact match we look up the parent chain for the one we were looking for, but this might have problems with overwritten definitions with the current Shape schema.

This may bear further thought and comparison with major JS engine implementations.

Performance considerations

Exception handling

I have not yet covered exceptions or exception handling, hoping they will be provided by another Wasm extension. Currently there's no way to "catch" traps, or create exceptions carrying a data payload in Wasm.

Thus propagation and handling of exceptions would require either waiting for that to be available, or importing support functions from the host (JS-only like in emscripten?), or using a global exception-state variable that's checked after every potential exception site, with manual unwinding. This will have some overhead.

Property loads

Thinking more about the property lookup process in eg let val = obj.prop. We can bias our checks to assume it's probably an object of matching shape.

  • load obj as an anyref from local or through a scope lookup
  • cast to JSObject:
    • if is, then:
      • unless the property key is statically known to be a non-integer-bearing string we must check for arrays (we can skip that in this instance because "prop" is not an integer or a string of one)
        • load the type_id from the JSValue/JSObject struct
          • if it is an Array or TypedArray:
            • check if the key is a valid array index:
              • this must check for i31ref, Boxi32, Boxf64, and String types and validate/normalize the contents. This is cheap for i31ref, getting more expensive as you go on the list.
              • if is a regular Array:
                • load an anyref value from the array
                • compare against the global BoxDeleted singleton ref (or cast to JSValue and check the type_id)
                  • if so, replace value with a null which is treated as JS undefined
              • if is a TypedArray:
                • load the ArrayBuffer reference from the TypedArray
                • load the backing array reference from the ArrayBuffer
                • load an i32 or i64 or f32 or f64 from the array, depending on the type
                • box it into an anyref (i31ref, Boxi32, Boxf64, or BigInt)
              • return.
          • if not, continue on Object path...
      • load the current Shape reference from the Object
      • load a global with the cached shape reference for this lookup
      • compare the shape refs:
        • if it matches:
          • load the cached offset from a global
          • check if cached offset is valid:
            • if offset is valid:
              • load the fields array
              • load the given index from the array
              • return.
            • if it is not valid:
              • load the cached getter from a global
              • check if getter is valid:
                • if getter is valid:
                  • call the getter
                  • return.
                • if it is not valid:
                  • invalid state; throw a trap
        • if not matching the cached shape:
          • run a slow-path lookup and cache the shape/index/getter
          • return.
    • if it's not a JSObject, then:
      • cast to int31ref
        • if is, then:
          • extract the int31 value and perform a backup path for boxed numbers
        • if isn't, then:
          • cast to JSValue, and call further paths for handling virtual or literal boxing of strings, other types etc

This is all pretty verbose, so either it has to be compiled inline (potentially with slight tweaks depending on which types are most expected at compile time) or it has to be done with a call into a function (which may potentially be inlined in the native codegen).

Doing caches with globals defined at compile time would probably perform the best, but means you either need to inline the cache-manipulating code, or you have to store them in a GC obj or linear memory somewhere to pass them into a non-inline function. (Eg, if you call the slow-path code, you want to cache the shape, and the offset, and the getter/setter, so you either need multivalue returns or a struct you can pass to the slow path containing the cache fields.)

@bvibber
Copy link
Author

bvibber commented Aug 10, 2020

Updated with a quick look at what a JS property-lookup looks like in this model. It's not pretty, but not super horrible I think. There are probably some tunings that can be made to minimize rtt comparisons and field lookups and simplify cached code.

@bvibber
Copy link
Author

bvibber commented Aug 10, 2020

The thing I'm most unsure about is the efficiency of type checks and branching out to various type-specific behavior...

From what I can tell, an attempt to cast with ref.cast will always require an rtt comparison, even if we already know which type it is by reading a discriminator field from the base JSValue struct. So ref.cast is just as expensive as br_on_cast, and can mean we're essentially doing two subtype comparisons under the hood.

If there's an "expected" type for a value at compile time, we may be able to optimize the checks by having a fast-path br_on_cast straight to the expected subtype, then ref.cast to JSValue and do the br_table if it failed.

For instance, let val = obj.propwould optimize for obj being an Object of any subtype and "prop" being a non-numeric string (no Array numeric index paths invoked, and even no need to cast to String because we can use a global holding the run-time copy of that compile-time string)...

On the other hand let val = obj[index] would assume index is variable: either an int31ref (probably), a String (common), a Symbol (less common), or something much rarer. If it's a numeric type or a string containing a positive 32-bit integer representation, then we have to check type_id for obj being an array type and cast to the suitable Array or TypedArray subtype.

(It's probably best to defer the numeric checks on strings until after checking for arrayness. Potentially it might be useful to memoize the numeric checks in the String struct too, though this might not be a big deal.)

Also: does br_table work efficiently in terms of branch prediction in a hot loop? (I'm guessing it should work reasonably well when fed consistent input but could be wrong.)

Note again the rtt checks only help when the types are structurally distinguishable in the MVP proposal: for instance you could not distinguish between Boxi32 and BoxBoolean from the rtt, and would have to check the type_id field if the difference matters in context. But in some contexts it doesn't matter, and you'd read the 0 from a BoxBoolean directly to use in math for instance just as you would a Boxi32...

@dcodeIO
Copy link

dcodeIO commented Aug 10, 2020

Since this issue is exploring GC from a JS perspective, with AssemblyScript being similar, perhaps a view on a hypothetical ideal GC from an AssemblyScript perspective plus a couple thoughts of the implications might be interesting. The following is oversimplified to keep it short, and likely unrealistic, but maybe there's something good to extract from it in context :)

Obviously, the closer the GC spec is to JS's object model, the more suitable it is for a JavaScript-like language to compile to it - respectively for any language to get ideal interop with JS. A nice-to-have-in-an-MVP list of high-level features may include:

  • ArrayBuffer, String, TypedArray<T>, a resizable Array<T>, Set<T>, Map<K,V> and a new form of TypedObject/Struct types, mirroring their JS methods to Wasm instructions. Easily targetable (for us) and ideal interop (for anyone) with a browser as the host. There are additional post-MVP candidates like Iterator<T> or Proxy<T> so eventually all the stuff in between Wasm and JS "just works".
  • Deriving sub TypedObjects with additional properties from another TypedObject, forming a "prototype"-like relation, obviously.
  • Defining an "implements"-like relationship of TypedObjects to other TypedObjects, with layout-differences abstracted away by the engine.
  • Marking fields (properties/methods?) of a TypedObject as private or readonly, only accessible/writable from within the same module but not from the outside to guarantee a baseline of integrity. An object with private-only fields would be opaque.
  • Comparing equality by reference, except for String, i.e. a string.eq instruction. Maybe string.lt etc. as well.
  • First-class representation of closures.

Most (hopefully more) of this should be possible to validate at link time, browsers (hopefully) already have most of the components necessary and it's not that different from the current state of the proposed MVP. Yet, non-web engines would need some sort of a Wasm standard library providing the respective abstractions, perhaps created and provided as a joint effort by the WebAssembly community.

One interesting aspect is how much such a scheme would already benefit languages that can be transpiled to JS today, but not Wasm, and as we all know there are a lot of these. Given that some of them need dynamic typing, having an any type and the ability a declare a TypedObject of any members or even a Map<any,any> as an escape hatch (even though inefficient due to redundant type checks and potential boxing) to enable them to target Wasm in the first place and later on think about a Wasm JIT spec on top of Wasm GC to get rid of redundant type checks and boxing at runtime might be a strategy not yet explored. Is based on the (as I think very interesting) observation made in another thread that for some languages, it might just be better to keep compiling to JS, yet on top of that we'd keep the door open to go full circle eventually.

Also worth mentioning is that there's no good C++ story for existing code in there, but it probably won't need much of the above anyway, except good interop. However, an interface to the Wasm standard library mentioned above can then be used from C++ or similar, i.e. to convert GC strings to linear memory or vice-versa, or to use Wasm GC-specific functionality directly where it makes sense, or to magically replace C++ standard library usage with Wasm standard library usage (next-level Emscripten), or even to compile Wasm standard library code to native by shipping a dynamic library able to provide this or other smart ways of downleveling. Regarding the web, the simplest case would be a GC-aware wrapper around a C++ library, while in other cases a developer might consider switching out parts of the C++ standard library with the Wasm standard library where it makes sense, or start a new project right on top of it.

That's all :)

@bvibber
Copy link
Author

bvibber commented Aug 10, 2020

A note on the notion of private struct/object fields -- currently the only way to add new functions at runtime (useful for both optimization and dynamic code loading) is to import host functions that compile a new module and and link them to share internal state.

If functions compiled in the trusted second module can't access private fields from types defined in the first module, you're gonna have a bad time.

So either there would need to be a way to add functions at runtime that kept module identity, or the privacy restriction would need to be enforced through an import... having access to the type definition would work if subtyping is not structural but based on declarations and inheritance patterns?

@dcodeIO
Copy link

dcodeIO commented Aug 11, 2020

Hmm, yeah, adding new functions at runtime is something this doesn't cover. Perhaps dynamic code generation or loading might become an independent Wasm Reflection spec providing what's necessary, i.e. looks like this needs some sort of GC or JIT first anyway?

rossberg added a commit that referenced this issue Feb 24, 2021
Refactor segment representation in AST (in both spec and interpreter) by separating out a `segment_mode`.

Other changes in Spec:
- Various fixes to text format grammar of segments.
- Factor out elemkind in binary format.
- Add note about possible future extension of element kinds.
- Add note about interpretation of segment kinds as bitfields.
- Fix some cross references.

Other changes in Interpreter:
- Rename {table,memory}_segment to {elem,data}_segment.
- Some rename elem to elem_expr and global.value to global.ginit for consistency with spec.
- Decode the elem segment kind as vu32 to maintain backwards compat.
- Some code simplifications / beautifications.
rossberg added a commit that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants