Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebAssembly Linker Backend (with WasmGC and Wasm ExceptionHandling) #4928

Open
tanishiking opened this issue Jan 9, 2024 · 13 comments
Open

Comments

@tanishiking
Copy link
Contributor

tanishiking commented Jan 9, 2024

Still WIP writing, thinking about adding a wasm (gc) support based on Scala.js

Overview

WebAssembly support in Scala.js was discussed in the presentation titled ["Scala.js and WebAssembly, a tale of the dangers of the sea" by Sébastien Doeraene, which can be found on YouTube here.
The presentation highlighted that in 2019, there were certain aspects lacking in the WebAssembly support for Scala: WasmGC was quite early stage like phase 1 or 0 at that time.

In late 2023, the WasmGC extension became the default in Chrome (V8)1 and Firefox2.

The Exception Handling proposal is now available on many WebAssembly (Wasm) engines, including those with JavaScript engines as embedders1. Given this development, it is an opportune moment to reconsider WebAssembly support for Scala in 2024. Notably, various garbage-collected languages such as OCaml, Kotlin, Java, and Dart support WebAssembly utilizing WasmGC.

This proposal suggests adding a new linker backend designed to compile linked sjsir modules into WebAssembly using

Why Wasm?

Wasm was initially designed for faster performance close to native code execution within web browsers. However, its usecases extend far beyond the browser, owing to its robust security features and portability. Also, the introduction of WASI further expands its range of use cases.

  • Faster code execution in browser
  • Plugins
  • Cloud
  • Edge
  • IoT
  • Interop with other languages

For more details: Exploring WebAssembly outside the browser - Atamel.Dev

Other ways to compile from Scala to Wasm?

Why do we propose compiling WebAssembly from Scala.js (SJSIR) when there are various methods to compile Scala to WebAssembly?

  • Compile JVM to Wasm
    • CheerpJ, a browser-based JVM, compiles JVM to Wasm. While this is great for modernizing legacy JVM applications, the large size of modern JVMs might not be ideal for faster code execution and writing small executables like plugins or extensions. Also, JS-interop would not be easy.
  • Compile Java bytecode to Wasm
    • TeaVM AOT compiles Java bytecode to JavaScript and Wasm.
    • While current implementation ships with full-blown GC and exception handling which slows down the execution performance3, it's definitely a promising project.
    • Other differences between Scala.js vs TeaVM would be the same as 9 years ago?
    • sbt plugin is also available.
  • Compile JS engine to Wasm
    • Javy compiles JS to Wasm module: running the given JS code on the QuickJS embedded into the Wasm module.
    • Good workaround for building WASI module (that works high-level constructs such as GC, EH, and async) from Scala at this moment.
    • Slower performance due to the lack of JIT compilation, and the module size still tend to larger.
  • Compile LLVM IR to Wasm
    • It's been explored in ScalaNative project that compiles Scala to LLVM IR (and then native binary); with Emscripten or WASI-SDK compiles the LLVM IR to WebAssembly.
    • However, it turned out that there's no way of expressing the WASM GC primitives in there. Which means, we need to ship full-blown GC code to the wasm module, and it would decrease the performance (as TeaVM struggles) and makes it tricky to interop with JS objects in GC context.
    • Can WasmGC adopt a similar toolchain model as WasmMVP, and in particular use LLVM? Unfortunately, no, since LLVM does not support WasmGC (some amount of support has been explored, but it is hard to see how full support could even work). Also, many GC languages do not use LLVM–there is a wide variety of compiler toolchains in that space. And so we need something else for WasmGC. (from https://v8.dev/blog/wasm-gc-porting)

  • Compile NIR to Wasm
    • After considering the above candidates, there were 2 choices on my mind: NIR (a ScalaNative intermediate language) to Wasm, or from sjsir (a Scala.js intermediate language) to Wasm.
    • I haven't explored enough the possibility of NIR to Wasm TBH, but compiling from SJSIR to Wasm seems easier for some reasons.
      • Easier JS interop: while Wasm is going beyond the browser embeddings, the easy JS-interop is still a primary usecase. While ScalaNative has a good interop with native code on LLVM layer, Scala.js API would be a better designed for JS-interop.
      • NIR might be too low-level for compiling to WasmGC (not sure): NIR is basically a high-level LLVM IR, while WasmGC is kind of Java-bytecode-like high-level language. Though I haven't explored enough, It seems sjsir is high-level enough for compiling to WasmGC.
      • Following other GC-languages' choice: J2CL, Kotlin/Wasm, and wasm_of_ocaml are all JS-compiler, customized to emit WasmGC.

How?

Add a new implementation of org.scalajs.linker.standard.LinkerBackEnd that compiles to WasmGC.

This design is based on the observation that how Kotlin/Wasm and J2CL compile high-level constructs to WebAssembly. It's worth noting that the design might undergo changes during implementation.

A few notes on WasmGC and Kotlin/Wasm:

Class definition

class Base(p: Int):
  def foo(): Int = 1

The class definition will be represented as a struct type in WasmGC.

(type $Base_t (sub $java.lang.Object (struct
    (field (ref $Base.vtable_t)) ;; vtable
    (field (ref null struct)) ;; itable
    (field (mut i32)) ;; typeInfo
    (field (mut i32)) ;; hashCode
    (field (mut i32)) ;; p
))) ;; hashCode
  • Same as Kotlin/Wasm and J2CL, the class definition will have vtable and itable.
  • Regarding itable, will explain more in interface call section.
  • Will have typeInfo, and hashCode fields, following Kotlin/Wasm, but it might not be needed, let's see.
  • And, there'll be a fields for the class fields.

The vtable contains the function references to the methods.

;; Type definitions of vtable and methods
(type $Base.vtable_t (sub $java.lang.Object.vtable_t (struct
    (field (ref null $Base_foo_t)))))
(type $Base_foo_t (func (param (ref null $Base)) (result i32)))

;; the vtables will be defined as global struct.
(global $Base.vtable_g (ref $Base.vtable_t)
    ref.func $Base.foo_fun
    struct.new $Base.vtable_t)
(func $Base.foo_fun (type $Base_foo_t)
    (param $this (ref null $Base_t)) (result i32)
    i32.const 1
    return)

The constructor will setup vtalbe, itable, and initialize the fields.

global.get $Base.vtable_g ;; vtable
ref.null struct ;; itable (it's gonna be null reference because it doesn't implement any interfaces)
i32.const 0 ;; typeinfo (how to calculate it? TODO)
i32.const 0 ;; hashCode (will be calculated and cached when we call hashCode)
        
local.get $p
struct.new $Base_t

Virtual call

class Base(p: Int):
  def foo(): Int = 1

class Derived(p: Int) extends Base(p):
  override def foo(): Int = 2

object Test:
  def box(): Unit =
    val d = new Derived(1)
    bar(d)
  def bar(f: Base): Int = f.foo()

The definition of Derived will be like

;; Same as Base_t except super class is Base_t and vtable is Derived.vtable
(type $Derived_t (sub $Base_t (struct
    (field (ref $Derived.vtable_t)) (field (ref null struct)) (field (mut i32)) (field (mut i32)))))
(type $Derived.vtable_t (sub $Base.vtable_t (struct
    (field (ref null $Base_foo_t)))))
(type $Base_foo_t (func (param (ref null $Base_t)) (result i32)))

The bar method (that contains virtual call to foo) will be

(type $bar_t (func (param (ref null $Base_t))))
(func $bar_fun (type $bar_t)
    (param $0_f (ref null $Base_t)) (result i32)
    ;; push two receiver instance of `Base` type.
    ;; one is for getting function reference from vtable
    ;; another one is for the receiver argument for the foo method
    local.get $0_f  ;; type: Base
    local.get $0_f  ;; type: Base
    struct.get $Base_t 0  ;; push vtable of Base to the stack
    struct.get $Base.vtable_t 0 ;; push function reference to foo
    call_ref (type $Base_foo_t) ;; call the function reference using `call_ref`
    return)

Why we don't use call_indirect as Rust does?

  • WebAssembly's table is basically one big virtual table in a module (in Wasm 1.0), which is untyped alternative to typed function references.
  • Even if we register functions in table, the classes still need to the pointer (table index) to the methods. So, what's the point of using call_indirect with WasmGC?

Interface call

trait Animal:
  def sound(): Unit

class Cat extends Animal:
  def sound(): Unit = {}

def baz(animal: Animal) = animal.sound()

The Cat class will have an itable

;; Cat's itable has an pointer to `Animal`'s itable
(type $Cat.classITable_t (struct
    (field (ref null $Animal.itable_t))))
(type $Animal.itable_t (struct (field (ref null $Animal_sound_t))))

(global $Cat.classITable_g (ref $Cat.classITable_t)
    ref.func $Cat_sound_fun ;; function ref to `Cat.sound` implementation
    struct.new $Animal.itable_t
    struct.new $Cat.classITable_t)

The interface call site (baz method) will be looks like:

(func $baz_fun (type $baz_fun_t)
    ;; the static interface will be typed as `java.lang.Object`
    (param $0_b (ref null $java.lang.Object))
    ;; same as virtual call, one for get itable, and one for receiver
    local.get $0_b  ;; type: Animal
    local.get $0_b  ;; type: Animal
    
    struct.get $java.lang.Object 1 ;; get itable
    ref.cast $Cat.classITable ;; need to cast because the given static interface is Object
    struct.get $Cat.classITable 0 ;; get the Animal.itable
    struct.get $Animal.itable_t 0 ;; get the function reference to `Cat.sound_fun`
    call_ref (type $Animal_sound_fun_t)
    ;; ...
    return)

The method to call will be searched for based on the signature at compile time, and we'll just access to the itables by index.


concurrency

I haven't yet delved into this area much, but it seems webassembly native threads feature is already at phase 4, and available at most of popular runtimes including wasmtime 4 thanks to wasi-threads

Screenshot 2024-01-10 at 21 03 51

Image from WebAssembly Threads - HTTP 203 - YouTube

Focus on sindle-threaded at first, and eventually support multi-threading later on.


exception handling

Relies on wasm native exception-handling

Following the Kotlin/Wasm lowering strategy https://github.com/JetBrains/kotlin/blob/4786c945d933c82c9560a9923f33effc59a80093/compiler/ir/backend.wasm/src/org/jetbrains/kotlin/backend/wasm/lower/TryCatchCanonicalization.kt#L24-L67

For try catch

// From this:
//    try {
//        ...exprs
//    } catch (e: Foo) {
//        ...exprs
//    } catch (e: Bar) {
//        ...exprs
//    }
// We get this:
//    try {
//        ...exprs
//    } catch (e: Throwable) {
//        when (e) {
//            is Foo -> ...exprs
//            is Bar -> ...exprs
//        }
//    }
// https://github.com/JetBrains/kotlin/blob/4786c945d933c82c9560a9923f33effc59a80093/compiler/ir/backend.wasm/src/org/jetbrains/kotlin/backend/wasm/lower/TryCatchCanonicalization.kt#L24-L67

We'll have only one exception tag in the module who's type is java.lang.Throwable. When we throw an exception, it's always compiled to throw 0 with an operand of type (or subtype of) java.lang.Throwable.

(tag $tag (param (ref null $java.lang.Throwable)))) ;; whose tag idx is 0

Also, the catch clause in Wasm always catch the java.lang.Throwable with catch 0, and then validate the exception's type. If none of catch clauses (in Scala) caught an exception, rethrow the exception.

try/catch

For example, (ExceptionA extends Exception and ExceptionB extends Exception)

try {
  throw Exception()
} catch (e: ExceptionA) {
} catch (e: ExceptionB) {
}

This will be compiled to

(local $0_merged_catch_param (ref null $java.lang.Throwable)
try
    call $java.lang.Exception.<init> ;; construct exception and push to the stack
    throw 0 ;; throw an exception
catch 0
    local.tee $0_merged_catch_param ;; thrown exception
    ref.test $ExceptionA_t ;; test if the thrown exception is a subtype of ExceptionA
    if ;; catch(e: ExceptionA) { ... }
        ;; ...
    else
        local.get $0_merged_catch_param
        ref.test $ExceptionB___type_44
        if ;; catch (e: ExceptionB) { ... }
            ;; ...
            else ;; if none of catch clauses catch the exception
                local.get $0_merged_catch_param
                throw 0 ;; rethrow it
            end
        end
    end)

finally

// With finally we transform this:
//    try {
//        ...exprs
//    } catch (e: Throwable) {
//        ...exprs
//    } finally {
//        ...<finally exprs>
//    }
// Into something like this (tmp variable is used only if we return some result):
//    val tmp = block { // this is where we return if we return from original try/catch with the result
//      try {
//        try {
//            return@block ...exprs
//        } catch (e: Throwable) {
//            return@block ...exprs
//        }
//     }
//     catch (e: Throwable) {
//       ...<finally exprs>
//       throw e // rethrow exception if it happened inside of the catch statement
//     }
//   }
//   ...<finally exprs>
//   tmp // result
// https://github.com/JetBrains/kotlin/blob/4786c945d933c82c9560a9923f33effc59a80093/compiler/ir/backend.wasm/src/org/jetbrains/kotlin/backend/wasm/lower/TryCatchCanonicalization.kt#L24-L67
  • The try/catch inside, is the normal try/catch handling described above
  • The try/catch outside is for if the exception isn't caught by any catch clauses, do something in finally clause and rethrow the exception.
try {
    throw Exception()
} catch (e: Exception) {
} finally { println("hello") }
block (result (ref null $Unit))
    try
        try
            ;; construct exception
            call $Exception.<init>
            throw 0
        catch 0
             local.tee $0_merged_catch_param
             ref.test $Exception
             if
                 ;; catch(e: Exception) { ... }
             else
                 local.get $0_merged_catch_param
                 throw 0
              end
              br 2 ;; jump to outside of the block
         end
         unreachable
    catch 0
          ;; println("hello")
          ;; push the caught exception to stack
          throw 0
     end
     unreachable
end
drop

;; println("hello")

JS interop

TBD

Q&A

  • Is WASI support is in scope?
    • Yes, but I haven't explored yet how to support WASI, let's focus on wasm for JS embeddings first, and then WASI.

Any advises or questions are welcome, especially from someone knows more about Scala.js / SJSIR internal.

Related

Footnotes

  1. WebAssembly Garbage Collection (WasmGC) now enabled by default in Chrome  |  Blog  |  Chrome for Developers

  2. Firefox 120.0, See All New Features, Updates and Fixes.

  3. TeaVM is suffering from the problem No, WASM is slow. As a developer of TeaVM I can claim this. First of all, JS eng... | Hacker News and the author thinks that Wasm GC and EH will improve the situation Chrome is now released with Wasm GC enabled by default - TeaVM 2

  4. Feature Extensions - WebAssembly

@lrytz
Copy link
Contributor

lrytz commented Jan 10, 2024

This is an awesome writeup, thank you @tanishiking! Learning about WebAssembly was long overdue for me and this helped tremendously. I posted some questions / comments on the discourse thread.

@tanishiking
Copy link
Contributor Author

I've been wondering if compiling to WasmGC via Scala.js is really the best immediate solution (although it's hard to say what's "good" unless we define what our goals are in Wasm support) in the thread.
I'm becoming more and more convinced that Scala.js is still the best option for WasmGC support. I hope to start working on support for WasmGC in the near future.

@tanishiking
Copy link
Contributor Author

tanishiking commented Jan 25, 2024

I'm considering developing a Wasm IR (like JS IR in Scala.js) with a structure closely resembling Wasm (WAT S-expr format). This IR would compile to either WebAssembly binary or WAT, similar to what has been done in Kotlin/Wasm.
Considering the simple code structure of WebAssembly, I believe it shouldn't be that difficult.

Alternatively we could use Binaryen C API to emit WebAssembly binary or text formats (S-expr). We can generate the Binaryen IR from the JVM, and we Binaryen will generate Wasm binary/text, we don't need to implement those conversions. (this would limit the Scala.js wasm linker backend to only work on the JVM platform though).
I found a Binaryen JNA mapper in Kotlin, and I also created one in Java.
However, I'm struggling to make it work correctly. It behaves differently than when I used it directly via C 🤔 (There might be something wrong with the mapping, but I'm not sure how to debug it).

@lrytz
Copy link
Contributor

lrytz commented Jan 25, 2024

Why are you considering a new IR?

Would it be built from SJSIR, or directly from Scala compiler ASTs?

@sjrd
Copy link
Member

sjrd commented Jan 25, 2024

I think @tanishiking just means an AST data structure for Wasm. Then

  1. Compile the Scala.js IR to that AST
  2. Pretty-print the AST to Wat (the text format of Wasm) or serialize it to Wasm

@tanishiking It would be more in the spirit of this repo to define our own AST in Scala. As you mentioned, Wasm as such is not that complicated; the AST should be simpler than javascript.Trees. Doing so has two main advantages:

  • it also works on JS, as you identified, and
  • it does not cause any distribution/shipping issue: in a JVM world, having to depend on an external native binary is a distribution hassle, which we entirely avoid if we have our own AST. It also makes the whole thing much more reliable/robust IMO.

@andreaTP
Copy link
Contributor

Hi everyone 👋 long time no see!

Stepping by just to let you know that there are at least a couple of Java projects that might spare you some cycles @tanishiking:

  • qbicc: is emitting wasm files
  • Chicory: we are just reading wasm but I would be happy to work together and standardize over a single Java library for reading/writing Wasm modules for the Java ecosystem if this matches your goals

@tanishiking
Copy link
Contributor Author

tanishiking commented Jan 25, 2024

@lrytz Ah yeah that's right, I meant in-memory data structure that serialize to binary/text, which is more like AST, thank you for the clarification @sjrd !


@andreaTP Thank you for your input. I'll definitely check out those projects!

I would be happy to work together and standardize over a single Java library for reading/writing Wasm modules for the Java ecosystem if this matches your goals

That sounds great; I was personally looking for such a project 😃 However, for this specific project, I believe developing the functionality (to writing Wasm binary) in Scala would be a better fit. If we implement it in Scala, we can compile it's linker backend itself in Scala.js (which wouldn't be possible if it's in Java) as @sjrd mentioned.
Additionally, designing it within our own repository might be more convenient for us at the moment. Creating a standardized Wasm module writer/reader library requires careful design consideration, and it might be easier for us to design it for our specific needs initially.

However, I'm willing to giving back insights to those projects in the future!

@tanishiking
Copy link
Contributor Author

While I'm still far from reaching a level where I can show it to others (the generated WAT code is invalid, and binary generation is not implemented yet), but I'm working implementing a prototype in this repository. https://github.com/tanishiking/scala-wasm

I aim to create something that can at least perform operations with primitive types, function calls, virtual dispatch, and top-level method exports, ignoring complex data structures like String or Array and transforming something like java stdlib. Once I achieve that, I plan to tidy it up and share it as PR against scala-js for feedback.

@sjrd
Copy link
Member

sjrd commented Apr 26, 2024

We had a meeting with @tanishiking and @gzm0 a few days ago about this, and we would like to share a summary.

State of the current implementation of Scala-Wasm

It is complete wrt. the semantics of Scala.js (i.e., the test suite passes), excluding:

  • @JSExport, which we don't expect to be able to implement in the foreseeable future,
  • updates to @JSExportTopLevel vars, which we should be able to implement, although it is a bit tedious,
  • the Checked Behaviors (they are all effectively Unchecked for now), which is also definitely implementable, and
  • multi-modules, for which we have not really evaluated what we should do about them.

Use cases

There are two big directions that a Scala-to-Wasm user can take: to use a JS host (typically for the Web) or another Wasm host.

For a JS host, we definitely want the semantics of Scala.js. The benefits will certainly be reduced code size, and if we're lucky, performance improvements. For that use case, we need not change anything to the language semantics nor to the IR.

For non-JS hosts, the benefit for users is to target an entirely new ecosystem. In that scenario, we will want to introduce interop features for at least the Component Model of Wasm, and instead remove the JS interop semantics. In that case, we definitely need IR changes. More critically, we will need a link-time way to reliably (i.e., from a reachability point of view) use one path or another. Either we do this with an entirely different ecosystem and IR (SWIR?), or we amend the Scala.js IR. If amending the Scala.js IR is not too disruptive, this would help adoption of Scala-to-Wasm, as the majority of the Scala.js ecosystem of libraries could be reused as is. We would avoid the slow bootstrap problem entirely.

We discussed one possible link-time dispatch mechanism that would fit in the Scala.js IR: a "link-time if-else". Its conditions would be restricted to a few elementary operations, based on a config dictionary of String->Int. For example it might contain "host"->0 for a JS host and "host"->1 for a WASI host. Or a "isWasm"->0/1, which could be used for code paths known to be better for Wasm than for JS (e.g., manipulating Longs more aggressively, or bit-casting between ints and floats, etc.)

Merging into Scala.js core

The main driver at this stage for wanting to merge into Scala.js core is the ability to influence the behavior of the Optimizer. Currently Scala-Wasm cannot deal with the output of the Optimizer, because it produces IR that does not IR-check. We've known this for a long time, of course. The main culprit is that the optimizer internally refines the type of some values, but does not reify that information into the produced IR. That's fine for JS but for Wasm we need a type-preserving transformation. IMO what we need is to insert special "Cast" transients that are like AsInstanceOf from a type point of view, but are always unchecked. More broadly, some things may be best optimized in one way for JS and in another for Wasm. In order to effectively test changes to the optimizer that are done in favor of Wasm, it would be much better to have Wasm in the same repository.

In terms of user experience, if we merge Wasm into Scala.js core, using Wasm would be one scalaJSLinkerConfig flag away. We also discussed an alternative experience where they configure a different scalaJSLinkerImpl instance, but that was quickly discarded as sub-optimal. We will need good reporting for the limitations of @JSExport.

One possible disadvantage of merging is that it would introduce more friction for experiments, notably to target non-JS hosts. However, we discussed that if that proves problematic, experiments could happen in a fork of the Scala.js repo until they are ready to be merged, or separately shipped.

Finally, we also mentioned possible de-merging in the future. We will need to clearly document that support for Wasm in the core is susceptible to be moved out at some point. If that happens, we should have learned what hooks we would need to effectively do that in the meantime.

Considering all these things, we decided to merge Scala-Wasm into Scala.js core at the moment. We will need to clearly document that support for Wasm in the core is susceptible to be moved out at some point.

Release cycle and versioning

With Wasm in the core, versioning will not change. We reserve the right to extract Wasm out of the core in a minor version. The release cycle should not be adversely impacted. If anything, it might provide new incentive to uphold our 2-month schedule between releases.

Benchmarks

We decided that we do not need to have benchmarks before merging.

Next steps

Given all the above, the next steps are:

  • Clean up the codebase of Scala-Wasm to adhere to Scala.js code quality standards.
  • Prepare decent communication to accompany the PR.

@gzm0
Copy link
Contributor

gzm0 commented Apr 26, 2024

TY for the writeup.

Comments from my side:

Library Management

We have also discussed the possibility of (selectively) using different artifacts for JS / wasm instead of a linking time mechanism and the difficulty / strain this would put on build / artifact resolution tooling.

Config

One thing that isn't entirely clear to me in terms of linker interface is how to handle extensions of output files.

For JS, we currently have OutputPatterns which the backend blindly observes (e.g. it will happily output CommonJS to main.mjs or even to main.png).

IIUC, for WASM we'd want *.wasm and *.wat (will we always emit both?). How are we going to map this onto the current config scheme?

@sjrd
Copy link
Member

sjrd commented Apr 26, 2024

IIUC, for WASM we'd want *.wasm and *.wat (will we always emit both?). How are we going to map this onto the current config scheme?

*.wat is only for debugging purposes. The source of truth remains the *.wasm file. The current implementation always emits the wasm file but only emits the wat file under withPrettyPrint(true).

Regarding OutputPatterns, IMO we'll add wasmFile and wasmFileURI to OutputPatterns, similarly to jsFile/jsFileURI. And if we do keep emitting wat files we would also have watFile{,URI}.

@gzm0
Copy link
Contributor

gzm0 commented Apr 28, 2024

FWIW: It seems the WASM spec is quite clear about the file extensions:

https://webassembly.github.io/spec/core/binary/conventions.html
https://webassembly.github.io/spec/core/text/conventions.html

IIRC, the reason we've introduced OutputPatterns is because of the need of having different file extensions. Maybe for WASM we shouldn't make this configurable (at least in a first iteration).

@sjrd
Copy link
Member

sjrd commented May 23, 2024

PR: #4988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants