Embedded devices: i8/i16 and memory pages less than 64KiB #899

Ekleog · 2018-10-24T04:24:37Z

Hello,

I have just read this post that appears to confirm that WebAssembly is still willing to have a future in embedded devices. So here is my feedback in trying to make WebAssembly run on such a device:

Mandatory support for i32/i64 is not nice when the hardware only offers i8 / i16 arithmetic. It's possible to work around it by re-implementing primitives, but it'd be great if i8 / i16 types just existed. Because then, assuming the application compiled in WebAssembly uses them, the polyfills would just not be used most of the time. Ideally, there'd be a wasm16 target that'd support only up to i16 arithmetic in addition to having a 16-bit address space, but that's maybe a bit too much to hope for.
64KiB per memory page is a lot. The devices I'm running on just don't have that much memory. So currently I compile with a patch to LLVM to reduce the memory page size to 1 byte, which makes things work, but are no longer officially wasm. If some way was available for applications to claim that they use less than a full page existed, this problem would vanish.

Hope this feedback can help!

lars-t-hansen · 2018-10-25T09:55:14Z

We've definitely discussed configurable page sizes in the past, and I can think of no reason why that wouldn't work still. The engines would need to choose a bounds checking strategy that is adapted to the page size that's chosen (fewer fancy trap tricks if the chosen page size is not some multiple of the system's page size) but not relying on traps may be the reality for embedded anyway.

A 16-bit profile of wasm is a bigger ask :) No reason it couldn't be done, but the burden of doing the detailed proposal (and the implementations) would probably fall to somebody in the embedded community.

Ekleog · 2018-11-30T07:43:44Z

Another issue that I'll dump here despite not really hoping for it to be fixed due to retro-compatibility: drop and select are the only operands (that I've noticed up to now) that require to know the size of the object on top of the stack. This means that, just for those two opcodes, the whole stack needs to be typed, meaning additional memory usage, where memory is scarce. Well, that or maybe analyze the code ahead-of-time, but this is quite costly on resource-limited components.

I must say I was shocked when hitting those, as they appear to be the only exception. All opcodes have proper i32.add, etc. versions that explicitly states the size of the operand (or, for get_local, the information is in the local's type, which is quite cheap), but drop and select require the whole stack to be typed. This makes me wonder why i32.add and i64.add are even separate opcodes.

Disclaimer: I'm not sure about br_if and the like yet, as I've currently only seen code generated for them as i32-taking. If it's not guaranteed, this message's request would also hope for i32.br_if / i64.br_if.

I'm very conscious that this will be hard-to-impossible to retrofit into the spec: anyway, fully spec-compliant interpreters would have to support these opcodes, even if they were deprecated. On the other hand, embedded interpreters can likely get away with implementing only part of the spec, and in particular not these two opcodes, if in practice compilers don't generate them but generate their sized variants.

So… I don't know?

binji · 2018-11-30T08:40:32Z

You can see the full list of instructions with their stack signatures here.

I'm not sure about br_if and the like yet, as I've currently only seen code generated for them as i32-taking.

Not sure if this is what you mean, but br, br_if and br_table can all optionally forward a value. The type of that value is the type of the label that they're branching to, for example:

block (result f32)
  ...
  f32.const 1
  br 0  ;; branch to end
end
;; top of stack is an f32 here

I must say I was shocked when hitting those, as they appear to be the only exception.

Yes, those are the only parametric instructions.

Though even if you removed those instructions, you wouldn't be able to validate without having a typed stack. I suppose you could assume that you've pre-validated the WebAssembly module, but at that point you could AOT compile the module offline instead.

I'd think if you were going to interpret on device, you'd want to do something like what the wabt interpreter does and decode and validate the module, but convert it to a bytecode that is friendlier for interpreters. Interpreting directly from the bytecode can be done, but it's clumsy as it requires a few auxiliary data structures.

rossberg · 2018-11-30T08:46:37Z

@Ekleog, I doubt that you can compile Wasm without typing the stack, regardless of these operations. An engine is expected to perform validation in the first place, and some other instructions, like branches, require knowledge of the current operand stack, since they need to clear it.

The difference between instructions like i32.add and drop is that addition is defined only for a fixed number of types, and it is operationally different for each type. Drop and select otoh work for any type and they perform the same operation regardless of type. This is essentially the difference between overloading and generics.

Note that, with some of the future extensions to Wasm, "any type" will be an infinite set, consider e.g. reference types. From that perspective alone, the same syntax cannot work -- even less so for branches, which can even have multiple operands with the upcoming multi-value proposal.

It is clearer not to view the type names occurring in some instructions as type annotations, but as a sort of module name selecting specific operational behaviour.

Ekleog · 2018-11-30T13:33:17Z

@binji About br_if, I was more thinking of the value popped by br_if to test whether it is 0.

I'd think if you were going to interpret on device, you'd want to do something like what the wabt interpreter does and decode and validate the module, but convert it to a bytecode that is friendlier for interpreters. Interpreting directly from the bytecode can be done, but it's clumsy as it requires a few auxiliary data structures.

I'm interpreting on device indeed, because I can't compile anyway (this is for running WASM on Java Card, which don't have any kind of compilation support).

The alternative to just running without validation is to pre-compile to some other bytecode beforehand, which, if I had understood the issues I mentioned here, I'd likely have done. The time slot I had to work on this will soon end and the project will likely never see the light of the day, but I must say that interpreting directly from the bytecode with neither beforehand validation nor typed stack (just with a side-stack to record the size of the stack at function entry, but it turned out it wasn't actually required as the compiler popped the relevant things, so it only acted as a debug helper when I had some imbalance in my builtins stack operations) worked pretty well in practice.

@rossberg

Drop and select otoh work for any type and they perform the same operation regardless of type.

This is assuming that the stack stores objects, and is not a stack of u16 (as is the case in my implementation, given this is the biggest native integer type on Java Card -- I guess on regular implementations it'd be compiling down to a stack of u8 that can then be cast into any type as need be). When the stack is a stack of u16, dropping an i32 and an i64 is definitely not operating the same way.

Note that, with some of the future extensions to Wasm, "any type" will be an infinite set, consider e.g. reference types.

Honestly, with some future extensions to WASM, there will be a GC. So it isn't really reasonable to assume that future extensions to WASM will work on embedded devices similar to the ones I'm working on :) But just passing the size (in bytes) of the operand to drop / select would handle the issue from my side.

rossberg · 2018-11-30T15:08:18Z

@Ekleog, I think to handle branches you'd have to record the stack size at function entry and at every block/loop/if entered, otherwise they cannot work correctly in general. Maybe the compiler you used didn't generate such code? Have you tried running the Wasm test suite?

For some context it should be mentioned that Wasm was explicitly designed for jitting, not for interpretation. So some things are more complicated for interpreters. How did you find branch targets, btw?

Ekleog · 2018-11-30T16:10:49Z

The Rust compiler didn't generate such code in the examples I've been using indeed. I haven't tried the test suite because I know my coverage is more than partial (typically, all i64 operations are not done apart from load and store because these were the only ones I needed for my own tests, and implementing 64-bit operations from 16-bit operations isn't really fun), and I haven't been able to go up to a stage where it would make sense to run it. And because I have that patch that changes the size of a “page” to be 1 byte, but that would be easy enough to adapt the test suite to.

For branch targets, I handled it by calling a function at each new block, and when breaking, by returning [break label] times from these functions. So the stack size could be recorded here without much issues. I haven't checked the amount of memory this consumed, but it appeared to fit in the memory I gave it for my tests.

ericprud · 2019-05-17T09:05:37Z

@Ekleog, would you be content to have the WebAssembly Core spec proceed with the current datatype and page size requirements while you draft a small document defining a profile for embedded devices?

Ekleog · 2019-05-17T11:25:58Z

@ericprud To be honest, I don't have much time for working on WASM-on-tiny-embedded any longer, so I don't know if I'll ever actually end up drafting it -- most work would be understanding how to define a profile for the WASM spec I think. So feel free to proceed in the way you deem best until someone (or maybe me some day) picks this issue up :)

FatihBAKIR · 2019-11-18T18:35:25Z

Hello, I'm working on using wasm on embedded devices and ran into the same page size issue. Is there any way I can help at least with that?

Serentty · 2019-11-27T22:57:58Z

@FatihBAKIR I actually have a similar use case. I'd love to use WebAssembly on 8-bit CPUs where 64 KiB is the entire address space, and where I don't want to have to waste four whole bytes on addresses.

FatihBAKIR · 2019-11-27T23:32:15Z

@Serentty, we're in the same situation, but before we can deal with the lack of smaller types, I wish the page size was configurable.

I don't understand the motivation behind defining the memory API in terms of pages anyway. Why doesn't it just speak in number of bytes? If my implementation could benefit from pages, I could round up to page size anyway.

For instance, wasm-ld uses 2 pages by default. None of the targets we use have 128 KB of RAM, let alone allocate that just for a wasm app that averages sensor readings.

Is there a document that explains this?

vshymanskyy · 2020-01-06T15:27:52Z

I would follow this. We faced the linear memory page size issue while developing the fastest WebAssembly interpreter.
Many embedded platforms can only afford to allocate 2..64 Kb of linear memory. Currently, we can introduce some workarounds (i.e. allocate as much memory as we can, and trap on OOB access), but I'm wondering if this issue is going to be addressed by WebAssembly standard.
I'm aware that Large page support is considered for the 🦄 future, but how about small pages? ;)

vshymanskyy · 2020-01-24T01:39:36Z

We have implemented the memoryLimit option in Wasm3, and it helps running wasm modules in very limited environments. Of course, if the wasm module is running it's own allocator, it knows nothing about the memoryLimit and tends to perform OOB accesses (mostly during heap initialization).

Here's the list of hardware that is capable of running Wasm3.
It also contains device specs so I think it may be useful here:
https://github.com/wasm3/wasm3/blob/master/docs/Hardware.md

vshymanskyy · 2020-01-24T12:19:03Z

Regarding i8/i16, from wasm3 perspective it looks like we can live happily without it. It will blow up the opcode space, and will probably affect every other aspect of WebAssembly.
I'd vote for i8 and i16 SIMD instructions.

Serentty · 2020-01-24T17:48:58Z

It will blow up the opcode space, and will probably affect every other aspect of WebAssembly.

This is a great opportunity to ask a question I have had for quite a while but felt wasn't worth an issue. Why does WebAssembly have separate opcodes for instructions for all of the different numeric types at all? After all, since it validates that the types on the operand stack line up, and knows their types because of the typed memory load instructions, this means it could just infer their types anyway, right?

vshymanskyy · 2020-01-26T18:31:52Z

Personally I don't have an answer to this. Maybe it's just for readability of text format, or to make validation process even more strict.

soundandform · 2020-01-30T03:45:24Z

@vshymanskyy @Serentty This is a solid question. If you want to make your validation strict, it really just makes everything more onerous without actually improving safety. I agree, it's probably about readability. Another thought: if you're doing straightforward interpretation of Wasm code (without validation), it somewhat simplifies the lookup of the opcode to perform.

Serentty · 2020-01-30T20:44:55Z

Sure, I see it makes decoding it a bit simpler, but on the other hand it seems like using up four times the opcode space as is necessary is a bit wasteful. Then again, since two-byte opcodes are under consideration anyway, maybe that was deemed a worthwhile trade-off.

tlively · 2020-01-30T23:39:59Z

An operation is split into multiple instructions when its semantics differ depending on the types of its operands. For example, an i32.add is semantically different from an i64.add, so they are specified to be separate instructions and therefore have separate opcodes. In contrast, the semantics of a local.get instruction can be described completely independent of the type it produces, so there is no need to have multiple local.get instructions. You're all right that we could have saved opcode space by having i32.add and i64.add be the same instruction, but that would have made the formal spec more complicated.

Serentty · 2020-01-31T00:03:09Z

Interesting. So the goal was to be able to describe one instruction per opcode? I'm not sure where I stand on this. On one hand simplicity is good, but on the other hand being able to fit nearly four times the instructions into a single byte sounds really nice. Anyway, I suppose there isn't much point deliberating over that now that the specification is finalized.

rossberg · 2020-02-03T11:56:06Z

What @tlively said. Another way to say it is that we did not want overloading in Wasm. That was a very early design decision, probably documented somewhere.

sffc · 2020-02-04T23:39:40Z

On page sizes:

We are trying to use WASM to build "microfunctions", small stateless functions that can be written once (e.g. in Rust) and then ported via WASM to run in a variety of runtimes. A WASM Memory may be built once and then used again and again for multiple microfunction invocations. The buffers backing the WASM Memories would be owned and destroyed by the host environment.

A 64 KiB page size is much larger than our microfunctions typically need. When a Memory is stored and used for multiple function invocations, we can have situations when 10-20 buffers are active at a certain time, which is a large, unnecessary memory cost.

A configurable page size, or at least one that is smaller than 64 KiB, would be really helpful.

Related: rustwasm/wee_alloc#88, WebAssembly/multi-memory#8

axic · 2020-02-06T18:56:36Z

Just to chime on the page size discussion, this was explored to some extent for discussing within a blockchain context (short lived executions): ewasm/design#161

Probably it is too late to change the 1.0 spec, but perhaps a champion could create a change proposal to be considered for later adoption.

binji · 2020-02-06T21:54:50Z

Probably it is too late to change the 1.0 spec

Yes, but as @lars-t-hansen mentions above, we can add a proposal to extend the format to allow smaller pages. It could work like the 64-bit memory proposal. We could use a bit in the limit flags to specify that this memory has a smaller page size; then subsequent memory.size and memory.grow instructions would use those sizes instead.

A great next step would be to present this to the community group. If the group agrees, then we can choose a champion and start moving forward with it as a proposal.

axic · 2020-02-06T22:08:18Z

We could use a bit in the limit flags to specify that this memory has a smaller page size

Hm, any ideas how could that be added in a backwards compatible manner? (Should I move this question over to the 64-bit proposal thread instead?)

binji · 2020-02-06T22:58:14Z

Basically, we'd make it so that by default, a memory would still have 64KiB pages. But you can mark the memory (either when defining it or importing it), to signify a smaller page size. This would be part of the memory type, so if you import that memory, you have to match the page size. Then, when a wasm VM is generating code, it will use the page size associated with its memory.

axic · 2020-02-06T23:12:27Z

Sorry I meant backwards compatible in the binary encoding: https://webassembly.github.io/spec/core/binary/modules.html#memory-section

Are there any ways to do that without bumping the version? As I was under the impression there was reluctance to introduce version bumping changes.

binji · 2020-02-06T23:18:41Z

Yes, you can use one of the bits in the limits encoding. Currently only 0x00 and 0x01 are allowed. This is how shared memory is defined in the threads proposal, and how 64-bit memory could be defined in that proposal.

axic · 2020-02-06T23:20:34Z

Thanks, understood! And sorry for the back and forth.

rossberg · 2022-08-04T08:57:14Z

Closing this. Please create a proposal if you want to see this feature in Wasm.

ericprud mentioned this issue May 17, 2019

[WASM] CR transition for WebAssembly Core Specification w3c/transitions#119

Closed

sunfishcode mentioned this issue Jan 6, 2020

Answers to questions in the README.md wasm3/wasm3#33

Closed

vshymanskyy mentioned this issue Jan 24, 2020

Any intention add fp16 & i16 native support in webassembly? #804

Closed

vshymanskyy mentioned this issue Jan 31, 2020

Improve support of embedded systems with limited RAM AssemblyScript/assemblyscript#1089

Closed

vshymanskyy mentioned this issue Dec 9, 2020

Wasm on Bare Metal ABIs (was: Wasm Cards) WebAssembly/design#1378

Open

jiayihu mentioned this issue Feb 16, 2021

Update to parity-wasm 0.42 and allow to configure stack limits wasmi-labs/wasmi#239

Closed

rossberg closed this as completed Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Ekleog commented Oct 24, 2018

lars-t-hansen commented Oct 25, 2018

Ekleog commented Nov 30, 2018

binji commented Nov 30, 2018

rossberg commented Nov 30, 2018

Ekleog commented Nov 30, 2018

rossberg commented Nov 30, 2018

Ekleog commented Nov 30, 2018

ericprud commented May 17, 2019

Ekleog commented May 17, 2019

FatihBAKIR commented Nov 18, 2019

Serentty commented Nov 27, 2019

FatihBAKIR commented Nov 27, 2019

vshymanskyy commented Jan 6, 2020 •

edited

vshymanskyy commented Jan 24, 2020 •

edited

vshymanskyy commented Jan 24, 2020 •

edited

Serentty commented Jan 24, 2020

vshymanskyy commented Jan 26, 2020

soundandform commented Jan 30, 2020

Serentty commented Jan 30, 2020

tlively commented Jan 30, 2020

Serentty commented Jan 31, 2020

rossberg commented Feb 3, 2020

sffc commented Feb 4, 2020

axic commented Feb 6, 2020

binji commented Feb 6, 2020

axic commented Feb 6, 2020

binji commented Feb 6, 2020

axic commented Feb 6, 2020 •

edited

binji commented Feb 6, 2020

axic commented Feb 6, 2020

rossberg commented Aug 4, 2022

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Comments

Ekleog commented Oct 24, 2018

lars-t-hansen commented Oct 25, 2018

Ekleog commented Nov 30, 2018

binji commented Nov 30, 2018

rossberg commented Nov 30, 2018

Ekleog commented Nov 30, 2018

rossberg commented Nov 30, 2018

Ekleog commented Nov 30, 2018

ericprud commented May 17, 2019

Ekleog commented May 17, 2019

FatihBAKIR commented Nov 18, 2019

Serentty commented Nov 27, 2019

FatihBAKIR commented Nov 27, 2019

vshymanskyy commented Jan 6, 2020 • edited

vshymanskyy commented Jan 24, 2020 • edited

vshymanskyy commented Jan 24, 2020 • edited

Serentty commented Jan 24, 2020

vshymanskyy commented Jan 26, 2020

soundandform commented Jan 30, 2020

Serentty commented Jan 30, 2020

tlively commented Jan 30, 2020

Serentty commented Jan 31, 2020

rossberg commented Feb 3, 2020

sffc commented Feb 4, 2020

axic commented Feb 6, 2020

binji commented Feb 6, 2020

axic commented Feb 6, 2020

binji commented Feb 6, 2020

axic commented Feb 6, 2020 • edited

binji commented Feb 6, 2020

axic commented Feb 6, 2020

rossberg commented Aug 4, 2022

vshymanskyy commented Jan 6, 2020 •

edited

vshymanskyy commented Jan 24, 2020 •

edited

vshymanskyy commented Jan 24, 2020 •

edited

axic commented Feb 6, 2020 •

edited