Skip to content

9oelM/wasm-notes

Repository files navigation

wasm-notes

Notes

Basics

  • WebAssembly is NOT C++.
  • Web stack based virtual machine. It is a processor that does not actually exist, but it helps compile real complex architectures.
  • When you write code in whatever language, compile it to WebAssembly (through what is called emscripten), then code compiles to the instruction set of the target machine(x86, ARM, ...) in wasm format.
  • Virtual machine is designed to compile to real processors. So it can run on any runtimes. You are running the code on the bare metal (securely).
  • Even AutoCAD now can runs on browser! Unity too. The browser has an ability to run all this. UI Toolkit QT also supports WebAssembly.

How did they do that?

  • emscripten. It's a drop-in replacement for the C/C++ compilers. Instead compiling to machine code, it gives you WebAssembly. Whatever code you wrote to run on a system should magically happen to run on the web too. emscripten does a LOT. Origianlly it was the compiler of asm.js (another project that would compile C code into javascript to run programs faster). emscripten even pretends to use OpenGL by using WebGL and real file system by using virual things. You can run the code that was never made for the web!
  • When WebAssembly came out, emscripten just added a new output format but kept all the work for the emulation. It was an accidental match with WebAssembly. It was so fit. There was no problem. Perhaps that's why C++ is so tightly involved with WebAssembly.

Ecosystems

  • Not for every topic, javascript ecosystem is big, while other languages' may be.
  • So you choose to either make yourself a javascript port if you don't find one in javscript, or resort to using other languages.
  • "Sqoosh". An image compression app written in javascript only. No server. Developers found that the ecosystem for image codecs and encoders was not so big in javascript, so they looked at C/C++. So? WebAssembly. They found some module in C++ and replaced it with the browser's encoder. Improvements were gained.
  • So now, ecosystems are not limited to a language anymore, with WebAssembly. You can now take something that was not used for the web to use it for the web, through emscripten and WASM.

How do you convert C code to Javascript? How do you configure it?

  1. Compiling the library
  2. Define functions that you want to use in javascript (bridge functions)
  3. Run emcc (emscripten C compiler)
  4. Then you get .cpp, .js, and wasm. Note, because emscripten does a lot of job under the hood, always check the file size.

Takeaway 1

If you have a gap in the web platform (javascript) that has been already filled many times in another language, WASM might be your tool!

Performance

  • Javascript & WASM are both equally fast as of now.
  • But it is easier for you to configure WASM to be faster (because it knows what to do, but you writing a javascript code may not know how you could optimize your code)
  • WASM is looking into things like multiple threads and simd -- things javascript will never get access to. Then we can expect it to outperform javascript.

Compilation of javascript vs WASM on web

JS: JS file => Ignition (V8 intepretor) => TurboFan(optimizing compiler to generate machine code) WASM: WASM file => Liftoff (WASM compiler) => TurboFan(optimizing code) js and wasm on v8 See the difference?

  1. Ignition is an interpretor, and WASM is a compiler (generates machine code). On average, machine code would be faster.
  2. But one more thing: the machine code may have to fall back to interpretor (called de-optimization) because not always the machine code is right, for it is making certain assumptions. But it's not the case for WASM (much faster, never de-opted).
  3. It delievers faster and more predictable performance. This is important because sometimes javscript works at very different rates in different browsers!

AssemblyScript?

  • AssemblyScript is a Typescript to WASM compiler. You cannot just throw in the code into WASM because for ex, it does not have a DOM API.
  • It uses a perfect Typescript syntax with a different type library! You don't have to learn a new language to write WASM.
  • For now, WASM does not have a built-in GC algo. You have to free the memory yourself.

Things to note

  • Putting everything into WASM is not a good idea for now
  • JS vs WASM are not opponents. They have things to complement eachother. Find the place where WASM fits in right!

Future of WASM

These are current proposals.

  1. Threads for parallel computation. Why? Many existing libraries in C/C++ work in multi-threads. Performance generally scales with multi-threads. Match on the web? There's Web Worker on the web! Currently stable. It has to formalize things a bit. Threads are shipped in Chrome 74 by default!
  2. Reference types. WASM can pass around arbitrary JS codes using the 'any' ref value type. WASM may run fundamental JS codes with this.
  3. WebIDL Binding proposal. It is used to define interfaces that are implemented on the web.
  4. GC, Exception handling, ....

See more at:

How AutoCAD turned their desktop app into a web app with wasm

  • Of course, AutoCAD is better on Web because it's portable.
  • AC supports .dwg file type in desktop.
  • Before: Flash, HTML5/Javasript.... was not really scalable.
  • 2017: AC released a viewer that allows dwg files to be shown using emscripten.
  • 2018: AutoCAD web app launched. Zero download, zero install.
  • They had C++ codebase, translated with emscripten, which gets compiled in the runtime on browser. UI: React & Typescript running together on the main thread. wasm modules: on web worker (separate thread). Interacting with the web app is just as smooth as the desktop app.
  • Implications: C++ developers never have to learn javascript to fix bugs.

WASM is better than a web worker

wasm memory

  • Web workers: can be quite heavy, and only work through postMessage. Limited.
  • wasm: uses shared array buffer to communicate between workers at high speed. Existing C++ codebases that use multi-threading or blocking features can be directly ported to the web to take full advantage.

Wasm entirely freed from going through javascript

wasm dom

  • wasm: it has to go through javacsript to have a reference to DOM. After: it won't be.
  • js module supports for wasm files will be supported as well.

SIMD in WebAssembly

Fresh out of the oven, v8 released its support for wasm SIMD in late Jan, 2020.

So what is SIMD? It:

  • is a single Instruction, Multiple Data
  • performs the same operation on multiple data elements
  • can benefit audio/video codecs, image processors, real-time motion tracking in video, ... (which have many repetitive and costly ops like matrix product)
  • essentially, is a dot product of two vectors
  • directly uses the computer hardware - CPU. It uses set of instructions widely supported by majority of CPUs in the market. (e.g. Intel SIMD instruction extensions (called Streaming SIMD Extensions (SSE), which are set of instructions for x86 architectures) or NVIDIA GPU)
  • can use up to 128 bits in wasm as of now

simd (pic from wasmer's article)

Why not SIMD in js but wasm?

Emscripten already can turn your code into something that uses SIMD

  • with SIMD option enabled, it can automatically detect and put SIMD in the appropriate places.

See more at:

WebAssembly and what makes it fast

Notes are taken/copied from this series

You would need this knowledge for understanding later parts on why wasm is faster.

How JS runs on browser

  • we write js, but machine needs binary, machine-readable codes
  • to 'translate' js, we can use:
    • interpreter: translation happens line-by-line, on the fly
      • good: fast
      • bad: useless translation cost when you are running the same code over and over (ex. loops)
    • compiler: doesn't transalte on the fly. It has to translate everything beforehand.
      • good:
        • can optimize the repetitive codes like loops
        • has a time to do additional optimizations
      • bad: slower

Just-in-time compiler takes both

  • a 'monitor' watches the code as it runs, and looks at the number of running times and used types.
    • 1st time: it runs everything through the interpreter (because it doesn't know anything)
      • If certain codes are run a few times, that segment of code is called warm.
      • If it’s run a lot, then it’s called hot.
  • baseline compiler will run after the interpretor.
    • If a function is used a lot, the monitor will use the compiled version instead of interpreting it again
    • will do certain optimizations
    • if the same code was assumed to be run but not run, JIT decides that it has made a wrong assumption and throws away the optimized code. Then you go back to either interpretor or compiled version again. (= deoptimization)
    • for example, if you are running over a loop of an array of items assumed to be numbers, the compiler may assume that all types are numbers in the array, so it branches out other types to be faster (because types are dynamic in js). But if some item is not a number, it decides that its assumption was wrong.
  • doing a lot of optimization and deoptimization may take a long time, so browsers have a limit on how many times.

Why Javascript is slow

  • It improved a lot compared to long time ago (no optimization happened)
  • Brief overview of JS engine's job nowadays (may differ a bit from engine to engine):
    • parse: source code => interpreter-readable code
    • compile + optimize: the time that is spent in the baseline compiler and optimizing compiler.
    • re-optimize: JIT works out failed assumptions
    • execute
    • gc
  • big improvements possible due to JIT
  • but still slow compared to wasm

Why wasm is fater (and js is slower)

  • Overiew of how wasm runs:
    • fetch: faster than js because it's smaller (compressed, binary)
    • parsing:
      • wasm does not go through any steps to become an IR (intermediate representation) because it already is
      • js needs to go through source code => AST => IR (bytecode)
    • compiling + optimizing:
      • The compiler doesn't need to know what types are being used before it starts compiling optimized code.
      • Compiler does not need to look at different versions of the same code based on those different types it observes.
      • More optimizations have already been done ahead of time in LLVM for wasm
    • reoptimizing: JIT doesn’t need to make assumptions about types based on data it gathers during runtime. No reoptimization for wasm.
    • executing:
      • js:
        • needs to be in a JIT-friendly way to run faster (and that's still slow)
        • optimizations in different browsers' JITs are different anyways, so performance may differ
      • wasm:
        • it gives machine-friendly instructions because it's designed to target a compiler
    • gc:
      • you don't have control over gc in js, meaning less control on performance
      • wasm: no gc at all, meaning consistent performance (but proposal is being made)

Resources

Watch lists

Reading lists

How-tos

Projects

Languages

Releases

No releases published

Packages

No packages published