Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASI proposal for dynamic typing (WASI-dyntype) #552

Open
xujuntwt95329 opened this issue Sep 5, 2023 · 8 comments
Open

WASI proposal for dynamic typing (WASI-dyntype) #552

xujuntwt95329 opened this issue Sep 5, 2023 · 8 comments
Labels
feature-request Requests for new WASI APIs

Comments

@xujuntwt95329
Copy link

This issue is to introduce the idea of proposing WASI-dyntype APIs.

Introduction

The type system of WebAssembly is entirely static in nature, precluding the direct compilation of dynamic programming languages into WebAssembly. Currently, the primary method of supporting dynamic languages involves compiling the language runtime into WebAssembly. This approach involves running virtual machines of individual languages within the WebAssembly environment (VM inside VM), which results in notable performance overhead and significant memory consumption.

Modern dynamic languages tend to incorporate more type annotations (such as TypeScript or type hints in Python). If these type annotations could be leveraged for static compilation, they would contribute to enhancing application performance. However, the aforementioned VM-inside-VM approach falls short of achieving this goal.

The objective of this proposal is to furnish a standardized set of public APIs designed for handling dynamically-typed objects. These APIs would facilitate the management and access of dynamic objects hosted by the external environment, thereby affording opportunities for statically compiling other objects with sufficient type information.

Strategy

  • All dynamic objects are represented as an externref, the actual object is managed by external environment.
  • Operating on these objects requires invoking the defined APIs
image

Goal

  • Support creating/accessing dynamic objects managed by external environment.
  • Support re-using existing built-in objects/methods from an external language engine.

Non-Goal

  • Define how to represent dynamic objects
    • reason: the implementer can select any acceptable solution or just delegate to an existing language runtime
  • Garbage collection mechanism between wasm and external world
    • reason: diverse scenarios necessitate varying garbage collection strategies, the implementer should consider how to collect the resources referenced by the externref based on their GC strategy.
  • Efficient accessing on dynamic object
    • reason: The efficiency of accessing dynamic objects depends on the data layout design, cache strategy and so on which is out of this proposal's scope. The primary benefit of this proposal lies in affording the opportunity to segregate dynamic and static typing handling. This separation empowers code with sufficient type information to fully leverage the performance advantages offered by static compilation techniques.
  • Support dynamic function
    • reason: our intention is to introduce only the component responsible for object management, the dynamic function necessitates the inclusion of the respective language interpreter, which is beyond the scope of this proposal.
  • Operation between dynamic objects
    • reason: different languages have different rules for operations, the handling of operations between dynamic types should be left to the discretion of compiler implementers.

API walk-through

object creation and property accessing

let obj : any = {};
obj.f1 = 1;
obj.f2 = 'Hello';

This can be generated as:

obj_ref = new_object();
set_property(obj_ref, "f1", new_number(1));
set_property(obj_ref, "f2", new_string("Hello"));

runtime type checking

let obj : any = 100;
let num : number = obj as number;

This can be generated as:

obj_ref = new_number(100);
if (is_number(obj_ref)) {
    num = to_number(obj_ref);
}

subtyping

a instanceof b

This can be generated as:

instance_of(a_ref, b_ref);

exception

let a: any = ...;
throw a;

This can be generated as:

throw(a_ref);

re-use existing built-in objects and methdos

let m: any = new Map();
m.set('a', 1);

This can be generated as:

m_ref = new_object_with_class("Map");
invoke(m_ref, "set", "a", 1);
                     ^ va-args

Detailed design discussion

  • Should we support all dynamic languages?

    WASI-dyntype should not bound to a specific language, however, the MVP version should mostly focus on requirement of JavaScript, there may be further extensions to support type system of other dynamic languages.

  • Will invoking these APIs be faster than compiling language runtime into WebAssembly?

    No, if the language is pure dynamic (e.g. JavaScript), the VM inside VM is preferred approach. This proposal doesn't aim to improve the performance of dynamic object accessing, it just provides an escape hatch for pure dynamic types so other objects in the source code with sufficient type information can have the opportunity to be statically compiled to WebAssembly.

Given this code as an example:

class A {
    x: number = 1;
}
let num: number = 100;
let a_inst = new A();
let obj : any = {};

This can be generated as such WebAssembly opcodes:

(local $num f64)
(local $a_inst ref $A)
(local $obj ref extern)
(local.set $num (f64.const 100))
(local.set $a_inst (struct.new_fixed $A (f64.const 100)))
(local.set $obj (call create_object))

Accessing dynamic object $a will be slow because it involves invoking host APIs, but accessing the $num and $a_inst would be much faster than the VM-inside-VM approach.

@lukewagner
Copy link
Member

I think this sort of feature belongs in Core WebAssembly, not WASI, since what you're describing deeply influences how the code is compiled and executed (including what types and instructions are available in wasm function bodies). The wasm-gc proposal just reached Stage 4 (coincidentally, 10 minutes ago 🎉 ) and I think forms the basis of what you want (engine-provided GC). I expect wasm-gc is more statically-typed and low-level than you're looking for, but having seen a great deal of discussion of this over the years, I believe baking in high-level dynamic language semantics is a path fraught with peril as there are so many mutually-conflicting requirements. Instead, I think an interesting question to ask is: given now-standard wasm-gc, are there any other GC primitives that would enable dynamic languages to be compiled more-efficiently on top of wasm-gc than is possible today? To pursue this question, I'd suggest scanning through open issues in the wasm-gc and design repos for existing discussions on this topic and filing new issues for remaining questions/ideas.

@xujuntwt95329
Copy link
Author

@lukewagner Thanks for the reply and suggestions, and it's really exciting to hear that wasm-gc reaches Stage 4 🎉

I think this sort of feature belongs in Core WebAssembly, not WASI, since what you're describing deeply influences how the code is compiled and executed (including what types and instructions are available in wasm function bodies).

These APIs did influence how the code is compiled, but seems it's difficult to enter Core WebAssembly since supporting high level semantics such as dynamic typing is not a goal of WebAssembly.

We are actually developing a compiler which can compile TypeScript to WebAssembly, we use wasm-gc to represent the statically typed code (e.g. number, boolean, class), but it's not possible to represent the dynamic types such as any. We learned the principle and goals of wasm-gc, and also tried to propose some opcode, but we realize that wasm-gc doesn't aim to provide high level or dynamic semantics.

So that's why we are here: we abstract the APIs to access dynamic objects managed from external environment, it works as an escape hatch for dynamic typing, so we can still use wasm-gc to represent the statically typed objects in TypeScript and use these APIs to access dynamic objects, this avoids compiling a whole language runtime into WebAssembly.

image

Previously most of the approaches to support dynamic language on WebAssembly is VM-inside-VM (e.g. compiling QuickJS or CPython to WebAssembly), they work on linear memory and can't benefit from wasm-gc. Since TypeScript have both static and dynamic part, the proposed APIs gives us the opportunity to leverage wasm-gc for compiling static part of TypeScript to wasm-gc, as shown in the picture above.

@sbc100
Copy link
Member

sbc100 commented Sep 12, 2023

I think I agree with Luke (assuming I understand his suggestion) that the best way to achieve this kind of thing would be look at extensions to wasm-gc in order to support the kind of dynamic behavior your need.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"? My understanding is that the goal of WebAssembly is to support all types of languages as efficiently as possible, but that we just started out targeting certain types of languages with the initial version.

@xujuntwt95329
Copy link
Author

I think I agree with Luke (assuming I understand his suggestion) that the best way to achieve this kind of thing would be look at extensions to wasm-gc in order to support the kind of dynamic behavior your need.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"? My understanding is that the goal of WebAssembly is to support all types of languages as efficiently as possible, but that we just started out targeting certain types of languages with the initial version.

Well, we think it would be better if this can be some wasm-gc extension, but according to some previous discussion, currently wasm-gc opcodes are "as low level as possible", so bring such dynamic typing into wasm-gc seems not compatible to the principle.

Where did you read that "supporting high level semantics such as dynamic typing is not a goal of WebAssembly"?

My previous description may be not very accurate, my understanding is that WebAssembly will not provide opcode to support dynamic type directly, currently dynamically typed languages already works well through compiling their runtime into WebAssembly, but this will introduce some footprint overhead.

These APIs separate the processing of dynamic typing, so we can compile the static part to wasm-gc directly, without another garbage collector inside wasm module.

@sbc100
Copy link
Member

sbc100 commented Sep 12, 2023

Well, we think it would be better if this can be some wasm-gc extension, but according to some previous discussion, currently wasm-gc opcodes are "as low level as possible", so bring such dynamic typing into wasm-gc seems not compatible to the principle.

Ah I see. My interpretation of that would be that in order for an addition of wasm-gc in support of dynamic languages to gain traction we would need to show that it would be much more efficient that building the same dynamic features on top of wasm-gc primitives.

Presumably it is technically possible to build a dynamic object model top of the wasm-gc object model? (e.g. represent the dynamic fields in some kind of map data structure?).

@kripken
Copy link
Member

kripken commented Sep 12, 2023

@sbc100

Presumably it is technically possible to build a dynamic object model top of the wasm-gc object model? (e.g. represent the dynamic fields in some kind of map data structure?).

I believe that's why @xujuntwt95329 proposed a new WasmGC instruction to allow dynamic field access as in the link in the previous discussion. It's hard to do without a new instruction, really (an array can't mix different types, and a struct has fixed access only, and there are no maps).

@xujuntwt95329 I sympathize with your position, since basically you went to the WasmGC people and got an unenthusiastic response, and so you came here but you got the same thing basically.

With that said, I do agree with the concerns mentioned both here and there, even though I am very much in favor of good support for dynamic languages in wasm. My own position is still what I wrote at the end of one my comments there,

you can store data in linear memory and [..] use WasmGC objects only for references.

That is, data in linear memory will easily allow dynamic field access using the normal tricks, and reference access using a WasmGC array also allows dynamic access. You will have overhead (separate storage for data and references, and casts from the WasmGC array) but it might still be fast enough for dynamic objects (especially since your compiler doesn't implement all objects dynamically). I'd recommend experimenting with that first, as the other options appear to be more radical and would require some changing of minds.

(This will need weak reference support, as mentioned before, but that is at least already planned, and can be polyfilled today on the Web using JS.)

@programmerjake
Copy link
Contributor

i think you can probably end up with a pretty efficient JS implementation using wasm GC by using the hidden classes and inline caching technique that V8 uses -- that doesn't usually use an arbitrary hashmap to represent objects.

@xujuntwt95329
Copy link
Author

xujuntwt95329 commented Sep 13, 2023

@kripken Thanks for your understanding 😀. The opcode we previously proposed to wasm-gc is much more low-level than these APIs, I believe that if we propose these dynamic features to wasm-gc, there will be a more direct reject.

I personally understand all the concerns from both side, because there are already solutions to support dynamic typed language (as suggested by @sbc100 and @programmerjake, implement the object management based on WasmGC, or current solution: compile whole VM into WebAssembly), seems there isn't a strong necessity to introduce a new concept at the standard level.

However, we are trying to, at least provide the opportunity to, avoid a runtime inside wasm module because there are many resource constraint devices, they may have very limited RAM and flash, remove the runtime from wasm module may allow these devices to install more applications. So that's why we want to use these APIs to separate the processing of dynamic types, it will have these benefits:

  • These APIs can be implemented in both host environment and wasm side.
    • If implemented in host side, the generated wasm module can be very small, which is suitable for embedded devices
    • If implemented in wasm side, everything is self contained in the generated wasm module, so there is no dependency to the host environment
  • These APIs can decouple the implementation for dynamic objects. For example:
    • on small devices we can implement based on QuickJS to reduce the memory consumption
    • on server we may implement based on V8
    • on browser and nodejs, these can be some simple JavaScript APIs
    • and we can also implement it inside wasm module through wasm-gc (as suggested by @sbc100 and @programmerjake ) or linear memory

This gives us the flexibility to utilize different implementations on different environment, while don't need to introduce too many new concepts into WebAssembly.

@kripken I like your idea that you can store data in linear memory and [..] use WasmGC objects only for references, actually I think this can work together with these proposed APIs: we store data in linear memory, use an externref to reference these memory space, and use the proposed APIs to access them.

image

@sunfishcode sunfishcode added the feature-request Requests for new WASI APIs label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Requests for new WASI APIs
Projects
None yet
Development

No branches or pull requests

6 participants