-
Notifications
You must be signed in to change notification settings - Fork 32
[WIP] Simple memory grow implementation #51
Conversation
src/malloc/wasm32/expand_heap.c
Outdated
unsigned delta = n / WASM_PAGE_SIZE; | ||
|
||
unsigned res; | ||
asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, does this actually work?
Presumably we could use an intrinsic here too, but I don't know if we have one? @dschuff ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting stuck on this line when trying to build:
clang -std=c99 -nostdinc -ffreestanding -Wa,--noexecstack -D_XOPEN_SOURCE=700 -I/src/musl/arch/wasm32 -I/src/musl/arch/generic -Iobj/src/internal -I/src/musl/src/internal -Iobj/include -I/src/musl/include -g -pipe -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Werror=implicit-function-declaration -Werror=implicit-int -Werror=pointer-sign -Werror=pointer-arith -Qunused-arguments --target=wasm32-unknown-unknown-wasm -O3 -c -o obj/src/math/__expo2f.o /src/musl/src/math/__expo2f.c
/src/musl/src/malloc/wasm32/expand_heap.c:20:2: error: implicit declaration of function 'asm' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
^
/src/musl/src/malloc/wasm32/expand_heap.c:20:49: error: expected ')'
asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
^
/src/musl/src/malloc/wasm32/expand_heap.c:20:5: note: to match this '('
asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
^
2 errors generated.
Here is the Dockerfile I'm using to build this:
https://github.com/LinusU/js-webp/blob/8f7968f1acc3101defa668e871c04c99f35e4a7c/Dockerfile#L72
I'm probably missing something simple though, I haven't had much experience with clang -> wasm before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experiments can be found at https://github.com/yurydelendik/wasmception/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that I'm running the same steps as you are doing. But I'm using a prebuilt LLVM from their official APT-repo instead of building it from source...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Success 🎉
The following mmap
implementation allows me to skip the mmap
syscall in JS land: a8f5265
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a no-op madvise
, since that is used during free
: e91c1c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Does anyone have any thoughts on this?
Yep, pushed more stubs to weed out mmap/madvise/signals syscalls. (Also published complete example with malloc/free exports at https://github.com/yurydelendik/old-man-sandbox/tree/master/nosyscall)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinusU, there are two possible variants in case of this mmap realization:
- simple_malloc realization without free operation. In that case this mmap realization has a right to life (but requested size should be fixed - memory.grow operates with PAGE_SIZE units).
- dlmalloc realization with full optimization and freeing "in the box". For that case this mmap realization is not so good... To cut a long story short in a "traditional" linux the heap is placed on a so-called break area that can grow (by brk/sbr syscalls) and shrink (by madvice) only from one side. On the other hand there is also mmap syscall that allocates free memory somewhere in the process address space. It means that mmap can't allocate memory from the break area and the "mmap" memory chunks aren't intersected with the "break" chunks.
Let's consider what will be in this mmap realization. Suppose we have some memory state S = [0, current_brk_area) where the current_brk_area is some constant from dlmalloc inner realization (in fact it is usually used not as a separate constant but the top chunk size from it's header). Then imagine the situation when there is no free memory in our S and a user tries to allocate a big memory chunk so dlmalloc will use mmap for it. And by this realization memory.grow expands Wasm memory by the supplied value. Then, note that now current_brk_area won't be corresponding to the real Wasm memory limit (call it expanded_memory_area)! Let's imagine the situation when a user tries to allocate small amount of memory from S that still doesn't have enough free memory - so again memory.grow will be called. But it expands not S but [0, expanded_memory_area) to some new area [0, expanded_current_brk_area). So it is possible to allocate chunks from [current_brk_area, expanded_memory_area] which has already allocated another chunks from previous mmap call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurydelendik I thougth about it but in this case we have a perfomance issue: all chunks that size is bigger then MMAP_THRESHOLD (0xE000 bytes) are located in the one bin. It may also cause a big memory fragmentation. But it seems that there is no simpler way to implement this and in future it is possible to tune this malloc realization.
ac8f86c
to
aba5aae
Compare
@@ -150,6 +150,8 @@ void __vm_unlock(void); | |||
|
|||
int __timedwait(volatile int *, int, clockid_t, const struct timespec *, int); | |||
int __timedwait_cp(volatile int *, int, clockid_t, const struct timespec *, int); | |||
|
|||
#ifndef __wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to do this in src/thread/__wait.c
as well in order to compile successfully: b87ccd6
Hmm, I'm not sure exactly why, but with the latest commit (5a0b0ec) I'm getting Looking into it more now... edit: but if I allocate 1,400,000 bytes of data twice, and than free that data again, before running the other part, it works 🤷♂️ 😄 edit: I'm currently building the Wasmception repository and will try to create a small C-program that will demonstrate the issue edit: I have a reproducible test case here: yurydelendik/wasmception#9 edit: nvm, I missed something obvious there 🤦♂️ edit: I don't even know anymore https://bugs.chromium.org/p/webp/issues/detail?id=403
edit: I'm also getting a similar error with libjpeg-turbo, so I'm starting to suspect that's it something here rather than in the library again 🤔
edit: and I'm also triggering this with lodepng now, and this time the stack trace actually points to one of musl's functions:
|
Oh, that is too bad. Apparently In the meantime, here is the file: edit: it seems like switching |
Okay! I've managed to reliably trigger the bug directly in the const pointer = instance.exports.malloc(1454860)
instance.exports.free(pointer)
// This call fails:
instance.exports.malloc(4786736) The Wasm file is built with |
@LinusU @yurydelendik It seems that one of the bugs in current implementation is out-of-bound access of bin_tab in bin_index_up during malloc call with the a size bigger than the initial MMAP_THRESHOLD:
Suppose that x = 1454860 as in @LinusU's example so (1454860 / (SIZE_ALIGN=16) - 1 -1) / 128 - 4 = 706 that bigger than bin_tab array size 60. I think that one possible solution for it is to add additional check in bin_index_up/bin_index functions for returning the last chunk for size bigger than MMAP_THRESHOLD. But there are some other changes is also needed. |
Notice that bin_index has this check, but bin_index_up has not. I guess it is because bin_index_up has only one user -- malloc. |
It seems like 2f9ae7b fixes the problems for me 😍 Will do some more testing tomorrow edit: it fixes one problem (malloc out of bounds), but not all. I can still get webp to go out of bounds, and lodepng to get stuck in an infinite loop 😞 🤔 |
@yurydelendik I was thinking about adding the new bin (65) specially for chucks that size is bigger than the old MMAP_THRESHOLD to unload the existing ones and decrease memory fragmentation. In this case fixes both of bin_index_up/bin_index is needed. |
@LinusU, I will investigate this new issue. Also it is interested why musl developers thrown out some settings from the initial coarse-graining dlmalloc (ftp://g.oswego.edu/pub/misc/malloc.c). F.e. HAVE_MMAP macros that can help us. And can I ask you two questions:
|
@yurydelendik It seems that bin_index_up should return bin_idx + 1 (it seems that this why it ends by "up"). So I suppose that 64 should be returned from it but I hasn't tested webp example yet. |
Awesome, really appreciate it ❤️
I would love it! I was actually looking into this, it's currently blocked on rustwasm/wee_alloc#9, but I think that that shouldn't be too hard to fix. |
I'll try it now 👍 |
musl is designed specifically for the linux kernel, so they can assume mmap() exists. I don't believe they are interested in porting to other kernels. If pretty easy to drop in a replacement malloc at link time if you don't like dlmalloc, emscripten has a couple of different options. Perhaps it makes sense to do this kind of thing in https://github.com/WebAssembly/reference-sysroot? |
@michaelvoronov I tried both libwebp and lodepng with this (b6d6025) commit added, but still got the same errors as before (out of bounds access, and infinite loop) |
@LinusU @yurydelendik I have dug into musl's allocator and find out that:
From these, it follows that
Finally, I have tested this approach there on libwep and its work! :) |
I'll close this PR in favor of more effective solutions, e.g. using dlmalloc as in wasi-sysroot |
@michaelvoronov awesome work, it seems to be working perfectly! I've started to use it in @yurydelendik as a current user of this repo, I would love for this to land |
This repo was only ever intended as an experiment to show that WebAssembly could do modules, what it would look like, and to bring up the first libc for WebAssembly. I got it pretty far, but my hope was always that something like wasi would come along. This was a quick hack, and wasi isn't, so I think you want to migrate to it. That being said, @sbc100 has been maintaining this project, and if he thinks it's worth keeping it going for a bit more I don't mind at all. I just don't want to impose extra maintenance burden on what I see as a historical project at this point in time. |
If anyone is interested, switching over to WASI SDK was super easy, here is the relevant commit: LinusU/cwasm-webp@52e33ef Huge thanks to everybody working on this cool stuff ❤️ 🚀 |
It is natural for wasm. That (and included __wait/__wake stubs) removes syscall imports for simple C code.