Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Closed
Ekleog opened this issue Oct 24, 2018 · 31 comments
Closed

Embedded devices: i8/i16 and memory pages less than 64KiB #899

Ekleog opened this issue Oct 24, 2018 · 31 comments

Comments

@Ekleog
Copy link

Ekleog commented Oct 24, 2018

Hello,

I have just read this post that appears to confirm that WebAssembly is still willing to have a future in embedded devices. So here is my feedback in trying to make WebAssembly run on such a device:

  • Mandatory support for i32/i64 is not nice when the hardware only offers i8 / i16 arithmetic. It's possible to work around it by re-implementing primitives, but it'd be great if i8 / i16 types just existed. Because then, assuming the application compiled in WebAssembly uses them, the polyfills would just not be used most of the time. Ideally, there'd be a wasm16 target that'd support only up to i16 arithmetic in addition to having a 16-bit address space, but that's maybe a bit too much to hope for.
  • 64KiB per memory page is a lot. The devices I'm running on just don't have that much memory. So currently I compile with a patch to LLVM to reduce the memory page size to 1 byte, which makes things work, but are no longer officially wasm. If some way was available for applications to claim that they use less than a full page existed, this problem would vanish.

Hope this feedback can help!

@lars-t-hansen
Copy link

We've definitely discussed configurable page sizes in the past, and I can think of no reason why that wouldn't work still. The engines would need to choose a bounds checking strategy that is adapted to the page size that's chosen (fewer fancy trap tricks if the chosen page size is not some multiple of the system's page size) but not relying on traps may be the reality for embedded anyway.

A 16-bit profile of wasm is a bigger ask :) No reason it couldn't be done, but the burden of doing the detailed proposal (and the implementations) would probably fall to somebody in the embedded community.

@Ekleog
Copy link
Author

Ekleog commented Nov 30, 2018

Another issue that I'll dump here despite not really hoping for it to be fixed due to retro-compatibility: drop and select are the only operands (that I've noticed up to now) that require to know the size of the object on top of the stack. This means that, just for those two opcodes, the whole stack needs to be typed, meaning additional memory usage, where memory is scarce. Well, that or maybe analyze the code ahead-of-time, but this is quite costly on resource-limited components.

I must say I was shocked when hitting those, as they appear to be the only exception. All opcodes have proper i32.add, etc. versions that explicitly states the size of the operand (or, for get_local, the information is in the local's type, which is quite cheap), but drop and select require the whole stack to be typed. This makes me wonder why i32.add and i64.add are even separate opcodes.

Disclaimer: I'm not sure about br_if and the like yet, as I've currently only seen code generated for them as i32-taking. If it's not guaranteed, this message's request would also hope for i32.br_if / i64.br_if.

I'm very conscious that this will be hard-to-impossible to retrofit into the spec: anyway, fully spec-compliant interpreters would have to support these opcodes, even if they were deprecated. On the other hand, embedded interpreters can likely get away with implementing only part of the spec, and in particular not these two opcodes, if in practice compilers don't generate them but generate their sized variants.

So… I don't know?

@binji
Copy link
Member

binji commented Nov 30, 2018

You can see the full list of instructions with their stack signatures here.

I'm not sure about br_if and the like yet, as I've currently only seen code generated for them as i32-taking.

Not sure if this is what you mean, but br, br_if and br_table can all optionally forward a value. The type of that value is the type of the label that they're branching to, for example:

block (result f32)
  ...
  f32.const 1
  br 0  ;; branch to end
end
;; top of stack is an f32 here

I must say I was shocked when hitting those, as they appear to be the only exception.

Yes, those are the only parametric instructions.

Though even if you removed those instructions, you wouldn't be able to validate without having a typed stack. I suppose you could assume that you've pre-validated the WebAssembly module, but at that point you could AOT compile the module offline instead.

I'd think if you were going to interpret on device, you'd want to do something like what the wabt interpreter does and decode and validate the module, but convert it to a bytecode that is friendlier for interpreters. Interpreting directly from the bytecode can be done, but it's clumsy as it requires a few auxiliary data structures.

@rossberg
Copy link
Member

@Ekleog, I doubt that you can compile Wasm without typing the stack, regardless of these operations. An engine is expected to perform validation in the first place, and some other instructions, like branches, require knowledge of the current operand stack, since they need to clear it.

The difference between instructions like i32.add and drop is that addition is defined only for a fixed number of types, and it is operationally different for each type. Drop and select otoh work for any type and they perform the same operation regardless of type. This is essentially the difference between overloading and generics.

Note that, with some of the future extensions to Wasm, "any type" will be an infinite set, consider e.g. reference types. From that perspective alone, the same syntax cannot work -- even less so for branches, which can even have multiple operands with the upcoming multi-value proposal.

It is clearer not to view the type names occurring in some instructions as type annotations, but as a sort of module name selecting specific operational behaviour.

@Ekleog
Copy link
Author

Ekleog commented Nov 30, 2018

@binji About br_if, I was more thinking of the value popped by br_if to test whether it is 0.

I'd think if you were going to interpret on device, you'd want to do something like what the wabt interpreter does and decode and validate the module, but convert it to a bytecode that is friendlier for interpreters. Interpreting directly from the bytecode can be done, but it's clumsy as it requires a few auxiliary data structures.

I'm interpreting on device indeed, because I can't compile anyway (this is for running WASM on Java Card, which don't have any kind of compilation support).

The alternative to just running without validation is to pre-compile to some other bytecode beforehand, which, if I had understood the issues I mentioned here, I'd likely have done. The time slot I had to work on this will soon end and the project will likely never see the light of the day, but I must say that interpreting directly from the bytecode with neither beforehand validation nor typed stack (just with a side-stack to record the size of the stack at function entry, but it turned out it wasn't actually required as the compiler popped the relevant things, so it only acted as a debug helper when I had some imbalance in my builtins stack operations) worked pretty well in practice.

@rossberg

Drop and select otoh work for any type and they perform the same operation regardless of type.

This is assuming that the stack stores objects, and is not a stack of u16 (as is the case in my implementation, given this is the biggest native integer type on Java Card -- I guess on regular implementations it'd be compiling down to a stack of u8 that can then be cast into any type as need be). When the stack is a stack of u16, dropping an i32 and an i64 is definitely not operating the same way.

Note that, with some of the future extensions to Wasm, "any type" will be an infinite set, consider e.g. reference types.

Honestly, with some future extensions to WASM, there will be a GC. So it isn't really reasonable to assume that future extensions to WASM will work on embedded devices similar to the ones I'm working on :) But just passing the size (in bytes) of the operand to drop / select would handle the issue from my side.

@rossberg
Copy link
Member

@Ekleog, I think to handle branches you'd have to record the stack size at function entry and at every block/loop/if entered, otherwise they cannot work correctly in general. Maybe the compiler you used didn't generate such code? Have you tried running the Wasm test suite?

For some context it should be mentioned that Wasm was explicitly designed for jitting, not for interpretation. So some things are more complicated for interpreters. How did you find branch targets, btw?

@Ekleog
Copy link
Author

Ekleog commented Nov 30, 2018

The Rust compiler didn't generate such code in the examples I've been using indeed. I haven't tried the test suite because I know my coverage is more than partial (typically, all i64 operations are not done apart from load and store because these were the only ones I needed for my own tests, and implementing 64-bit operations from 16-bit operations isn't really fun), and I haven't been able to go up to a stage where it would make sense to run it. And because I have that patch that changes the size of a “page” to be 1 byte, but that would be easy enough to adapt the test suite to.

For branch targets, I handled it by calling a function at each new block, and when breaking, by returning [break label] times from these functions. So the stack size could be recorded here without much issues. I haven't checked the amount of memory this consumed, but it appeared to fit in the memory I gave it for my tests.

@ericprud
Copy link

@Ekleog, would you be content to have the WebAssembly Core spec proceed with the current datatype and page size requirements while you draft a small document defining a profile for embedded devices?

@Ekleog
Copy link
Author

Ekleog commented May 17, 2019

@ericprud To be honest, I don't have much time for working on WASM-on-tiny-embedded any longer, so I don't know if I'll ever actually end up drafting it -- most work would be understanding how to define a profile for the WASM spec I think. So feel free to proceed in the way you deem best until someone (or maybe me some day) picks this issue up :)

@FatihBAKIR
Copy link

Hello, I'm working on using wasm on embedded devices and ran into the same page size issue. Is there any way I can help at least with that?

@Serentty
Copy link

@FatihBAKIR I actually have a similar use case. I'd love to use WebAssembly on 8-bit CPUs where 64 KiB is the entire address space, and where I don't want to have to waste four whole bytes on addresses.

@FatihBAKIR
Copy link

@Serentty, we're in the same situation, but before we can deal with the lack of smaller types, I wish the page size was configurable.

I don't understand the motivation behind defining the memory API in terms of pages anyway. Why doesn't it just speak in number of bytes? If my implementation could benefit from pages, I could round up to page size anyway.

For instance, wasm-ld uses 2 pages by default. None of the targets we use have 128 KB of RAM, let alone allocate that just for a wasm app that averages sensor readings.

Is there a document that explains this?

@vshymanskyy
Copy link

vshymanskyy commented Jan 6, 2020

I would follow this. We faced the linear memory page size issue while developing the fastest WebAssembly interpreter.
Many embedded platforms can only afford to allocate 2..64 Kb of linear memory. Currently, we can introduce some workarounds (i.e. allocate as much memory as we can, and trap on OOB access), but I'm wondering if this issue is going to be addressed by WebAssembly standard.
I'm aware that Large page support is considered for the 🦄 future, but how about small pages? ;)

@vshymanskyy
Copy link

vshymanskyy commented Jan 24, 2020

We have implemented the memoryLimit option in Wasm3, and it helps running wasm modules in very limited environments. Of course, if the wasm module is running it's own allocator, it knows nothing about the memoryLimit and tends to perform OOB accesses (mostly during heap initialization).

Here's the list of hardware that is capable of running Wasm3.
It also contains device specs so I think it may be useful here:
https://github.com/wasm3/wasm3/blob/master/docs/Hardware.md

@vshymanskyy
Copy link

vshymanskyy commented Jan 24, 2020

Regarding i8/i16, from wasm3 perspective it looks like we can live happily without it. It will blow up the opcode space, and will probably affect every other aspect of WebAssembly.
I'd vote for i8 and i16 SIMD instructions.

@Serentty
Copy link

It will blow up the opcode space, and will probably affect every other aspect of WebAssembly.

This is a great opportunity to ask a question I have had for quite a while but felt wasn't worth an issue. Why does WebAssembly have separate opcodes for instructions for all of the different numeric types at all? After all, since it validates that the types on the operand stack line up, and knows their types because of the typed memory load instructions, this means it could just infer their types anyway, right?

@vshymanskyy
Copy link

Personally I don't have an answer to this. Maybe it's just for readability of text format, or to make validation process even more strict.

@soundandform
Copy link

@vshymanskyy @Serentty This is a solid question. If you want to make your validation strict, it really just makes everything more onerous without actually improving safety. I agree, it's probably about readability. Another thought: if you're doing straightforward interpretation of Wasm code (without validation), it somewhat simplifies the lookup of the opcode to perform.

@Serentty
Copy link

Sure, I see it makes decoding it a bit simpler, but on the other hand it seems like using up four times the opcode space as is necessary is a bit wasteful. Then again, since two-byte opcodes are under consideration anyway, maybe that was deemed a worthwhile trade-off.

@tlively
Copy link
Member

tlively commented Jan 30, 2020

An operation is split into multiple instructions when its semantics differ depending on the types of its operands. For example, an i32.add is semantically different from an i64.add, so they are specified to be separate instructions and therefore have separate opcodes. In contrast, the semantics of a local.get instruction can be described completely independent of the type it produces, so there is no need to have multiple local.get instructions. You're all right that we could have saved opcode space by having i32.add and i64.add be the same instruction, but that would have made the formal spec more complicated.

@Serentty
Copy link

Interesting. So the goal was to be able to describe one instruction per opcode? I'm not sure where I stand on this. On one hand simplicity is good, but on the other hand being able to fit nearly four times the instructions into a single byte sounds really nice. Anyway, I suppose there isn't much point deliberating over that now that the specification is finalized.

@rossberg
Copy link
Member

rossberg commented Feb 3, 2020

What @tlively said. Another way to say it is that we did not want overloading in Wasm. That was a very early design decision, probably documented somewhere.

@sffc
Copy link

sffc commented Feb 4, 2020

On page sizes:

We are trying to use WASM to build "microfunctions", small stateless functions that can be written once (e.g. in Rust) and then ported via WASM to run in a variety of runtimes. A WASM Memory may be built once and then used again and again for multiple microfunction invocations. The buffers backing the WASM Memories would be owned and destroyed by the host environment.

A 64 KiB page size is much larger than our microfunctions typically need. When a Memory is stored and used for multiple function invocations, we can have situations when 10-20 buffers are active at a certain time, which is a large, unnecessary memory cost.

A configurable page size, or at least one that is smaller than 64 KiB, would be really helpful.

Related: rustwasm/wee_alloc#88, WebAssembly/multi-memory#8

@axic
Copy link
Contributor

axic commented Feb 6, 2020

Just to chime on the page size discussion, this was explored to some extent for discussing within a blockchain context (short lived executions): ewasm/design#161

Probably it is too late to change the 1.0 spec, but perhaps a champion could create a change proposal to be considered for later adoption.

@binji
Copy link
Member

binji commented Feb 6, 2020

Probably it is too late to change the 1.0 spec

Yes, but as @lars-t-hansen mentions above, we can add a proposal to extend the format to allow smaller pages. It could work like the 64-bit memory proposal. We could use a bit in the limit flags to specify that this memory has a smaller page size; then subsequent memory.size and memory.grow instructions would use those sizes instead.

A great next step would be to present this to the community group. If the group agrees, then we can choose a champion and start moving forward with it as a proposal.

@axic
Copy link
Contributor

axic commented Feb 6, 2020

We could use a bit in the limit flags to specify that this memory has a smaller page size

Hm, any ideas how could that be added in a backwards compatible manner? (Should I move this question over to the 64-bit proposal thread instead?)

@binji
Copy link
Member

binji commented Feb 6, 2020

Basically, we'd make it so that by default, a memory would still have 64KiB pages. But you can mark the memory (either when defining it or importing it), to signify a smaller page size. This would be part of the memory type, so if you import that memory, you have to match the page size. Then, when a wasm VM is generating code, it will use the page size associated with its memory.

@axic
Copy link
Contributor

axic commented Feb 6, 2020

Sorry I meant backwards compatible in the binary encoding: https://webassembly.github.io/spec/core/binary/modules.html#memory-section

Are there any ways to do that without bumping the version? As I was under the impression there was reluctance to introduce version bumping changes.

@binji
Copy link
Member

binji commented Feb 6, 2020

Yes, you can use one of the bits in the limits encoding. Currently only 0x00 and 0x01 are allowed. This is how shared memory is defined in the threads proposal, and how 64-bit memory could be defined in that proposal.

@axic
Copy link
Contributor

axic commented Feb 6, 2020

Thanks, understood! And sorry for the back and forth.

@rossberg
Copy link
Member

rossberg commented Aug 4, 2022

Closing this. Please create a proposal if you want to see this feature in Wasm.

@rossberg rossberg closed this as completed Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests