Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

[WIP] Simple memory grow implementation #51

Closed

Conversation

yurydelendik
Copy link

It is natural for wasm. That (and included __wait/__wake stubs) removes syscall imports for simple C code.

unsigned delta = n / WASM_PAGE_SIZE;

unsigned res;
asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, does this actually work?

Presumably we could use an intrinsic here too, but I don't know if we have one? @dschuff ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting stuck on this line when trying to build:

clang -std=c99 -nostdinc -ffreestanding -Wa,--noexecstack -D_XOPEN_SOURCE=700 -I/src/musl/arch/wasm32 -I/src/musl/arch/generic -Iobj/src/internal -I/src/musl/src/internal -Iobj/include -I/src/musl/include  -g -pipe -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Werror=implicit-function-declaration -Werror=implicit-int -Werror=pointer-sign -Werror=pointer-arith -Qunused-arguments --target=wasm32-unknown-unknown-wasm -O3 -c -o obj/src/math/__expo2f.o /src/musl/src/math/__expo2f.c
/src/musl/src/malloc/wasm32/expand_heap.c:20:2: error: implicit declaration of function 'asm' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
        ^
/src/musl/src/malloc/wasm32/expand_heap.c:20:49: error: expected ')'
        asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
                                                       ^
/src/musl/src/malloc/wasm32/expand_heap.c:20:5: note: to match this '('
        asm("get_local %1; grow_memory 0; set_local %0": "=r"(res) : "r"(delta));
           ^
2 errors generated.

Here is the Dockerfile I'm using to build this:

https://github.com/LinusU/js-webp/blob/8f7968f1acc3101defa668e871c04c99f35e4a7c/Dockerfile#L72

I'm probably missing something simple though, I haven't had much experience with clang -> wasm before ☺️

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experiments can be found at https://github.com/yurydelendik/wasmception/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that I'm running the same steps as you are doing. But I'm using a prebuilt LLVM from their official APT-repo instead of building it from source...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success 🎉

The following mmap implementation allows me to skip the mmap syscall in JS land: a8f5265

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a no-op madvise, since that is used during free: e91c1c9

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Does anyone have any thoughts on this?

Yep, pushed more stubs to weed out mmap/madvise/signals syscalls. (Also published complete example with malloc/free exports at https://github.com/yurydelendik/old-man-sandbox/tree/master/nosyscall)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinusU, there are two possible variants in case of this mmap realization:

  1. simple_malloc realization without free operation. In that case this mmap realization has a right to life (but requested size should be fixed - memory.grow operates with PAGE_SIZE units).
  2. dlmalloc realization with full optimization and freeing "in the box". For that case this mmap realization is not so good... To cut a long story short in a "traditional" linux the heap is placed on a so-called break area that can grow (by brk/sbr syscalls) and shrink (by madvice) only from one side. On the other hand there is also mmap syscall that allocates free memory somewhere in the process address space. It means that mmap can't allocate memory from the break area and the "mmap" memory chunks aren't intersected with the "break" chunks.
    Let's consider what will be in this mmap realization. Suppose we have some memory state S = [0, current_brk_area) where the current_brk_area is some constant from dlmalloc inner realization (in fact it is usually used not as a separate constant but the top chunk size from it's header). Then imagine the situation when there is no free memory in our S and a user tries to allocate a big memory chunk so dlmalloc will use mmap for it. And by this realization memory.grow expands Wasm memory by the supplied value. Then, note that now current_brk_area won't be corresponding to the real Wasm memory limit (call it expanded_memory_area)! Let's imagine the situation when a user tries to allocate small amount of memory from S that still doesn't have enough free memory - so again memory.grow will be called. But it expands not S but [0, expanded_memory_area) to some new area [0, expanded_current_brk_area). So it is possible to allocate chunks from [current_brk_area, expanded_memory_area] which has already allocated another chunks from previous mmap call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurydelendik I thougth about it but in this case we have a perfomance issue: all chunks that size is bigger then MMAP_THRESHOLD (0xE000 bytes) are located in the one bin. It may also cause a big memory fragmentation. But it seems that there is no simpler way to implement this and in future it is possible to tune this malloc realization.

@@ -150,6 +150,8 @@ void __vm_unlock(void);

int __timedwait(volatile int *, int, clockid_t, const struct timespec *, int);
int __timedwait_cp(volatile int *, int, clockid_t, const struct timespec *, int);

#ifndef __wait
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do this in src/thread/__wait.c as well in order to compile successfully: b87ccd6

@yurydelendik yurydelendik changed the title Simple memory grow implementation [WIP] Simple memory grow implementation Oct 11, 2018
@LinusU
Copy link

LinusU commented Oct 12, 2018

Hmm, I'm not sure exactly why, but with the latest commit (5a0b0ec) I'm getting RuntimeError: memory access out of bounds at wasm-function[257]:13 which I didn't get previously 🤔

Looking into it more now...

edit: but if I allocate 1,400,000 bytes of data twice, and than free that data again, before running the other part, it works 🤷‍♂️ 😄

edit: I'm currently building the Wasmception repository and will try to create a small C-program that will demonstrate the issue

edit: I have a reproducible test case here: yurydelendik/wasmception#9

edit: nvm, I missed something obvious there 🤦‍♂️

edit: I don't even know anymore https://bugs.chromium.org/p/webp/issues/detail?id=403

     RuntimeError: memory access out of bounds
      at VP8YuvToRgb (wasm-function[416]:95)
      at VP8YuvToRgba (wasm-function[415]:135)
      at UpsampleRgbaLinePair_C (wasm-function[439]:2519)
      at EmitFancyRGB (wasm-function[94]:1110)
      at CustomPut (wasm-function[87]:305)
      at FinishRow (wasm-function[66]:2734)
      at VP8ProcessRow (wasm-function[64]:383)
      at ParseFrame (wasm-function[145]:494)
      at VP8Decode (wasm-function[144]:466)
      at DecodeInto (wasm-function[228]:754)
      at Decode (wasm-function[230]:375)
      at WebPDecodeRGBA (wasm-function[233]:106)
      at Object.exports.decode (index.js:31:42)
      at Context.it (test.js:25:25)

edit: I'm also getting a similar error with libjpeg-turbo, so I'm starting to suspect that's it something here rather than in the library again 🤔

     RuntimeError: memory access out of bounds
      at h2v2_fancy_upsample (wasm-function[252]:1150)
      at sep_upsample (wasm-function[245]:485)
      at process_data_context_main (wasm-function[167]:908)
      at jpeg_read_scanlines (wasm-function[81]:572)
      at tjDecompress2 (wasm-function[377]:3691)
      at Object.exports.decode (index.js:64:39)
      at Context.it (test.js:16:30)

edit: and I'm also triggering this with lodepng now, and this time the stack trace actually points to one of musl's functions:

     RuntimeError: memory access out of bounds
      at __bin_chunk (wasm-function[122]:26)
      at realloc (wasm-function[125]:643)
      at lodepng_realloc (wasm-function[26]:62)
      at ucvector_reserve (wasm-function[84]:241)
      at ucvector_resize (wasm-function[15]:73)
      at inflateHuffmanBlock (wasm-function[9]:1740)
      at lodepng_inflatev (wasm-function[6]:605)
      at lodepng_inflate (wasm-function[4]:149)
      at inflate (wasm-function[11]:236)
      at lodepng_zlib_decompress (wasm-function[10]:568)
      at zlib_decompress (wasm-function[77]:236)
      at decodeGeneric (wasm-function[80]:6392)
      at lodepng_decode (wasm-function[79]:144)
      at lodepng_decode_memory (wasm-function[86]:181)
      at lodepng_decode32 (wasm-function[91]:122)
      at Object.exports.decode (index.js:22:34)
      at Context.it (test.js:27:25)

@mikevoronov
Copy link

mikevoronov commented Oct 14, 2018

@LinusU I have tried to reproduce this bug but failed on building this (Ubuntu 18.04):

sh -c 'docker run --rm -it $(docker build -q .) | base64 -D > webp.wasm'
base64: invalid option -- 'D'

Can you provide me please resulted wasm file?

@LinusU
Copy link

LinusU commented Oct 14, 2018

Oh, that is too bad. Apparently base64 have different flags on Linux and macOS, I'll try and fix that to use something else.

In the meantime, here is the file:

webp.wasm.zip

edit: it seems like switching -D for --decode works both on macOS and Ubuntu, I've pushed a commit with that and rebased yuv-bug on master 👌

@LinusU
Copy link

LinusU commented Oct 14, 2018

Okay! I've managed to reliably trigger the bug directly in the malloc function. By calling just the following functions, the third call will try to access memory outside of the Wasm bounds:

const pointer = instance.exports.malloc(1454860)
instance.exports.free(pointer)

// This call fails:
instance.exports.malloc(4786736)

The Wasm file is built with --export=malloc,--export=free and doesn't contain any of my own code at all. See the full repository with build process and test script here:

https://github.com/LinusU/cwasm-malloc-bug

@mikevoronov
Copy link

mikevoronov commented Oct 20, 2018

@LinusU @yurydelendik It seems that one of the bugs in current implementation is out-of-bound access of bin_tab in bin_index_up during malloc call with the a size bigger than the initial MMAP_THRESHOLD:

static int bin_index_up(size_t x)
{
	x = x / SIZE_ALIGN - 1;
	if (x <= 32) return x;
	x--;
	if (x < 512) return bin_tab[x/8-4] + 1;
	return bin_tab[x/128-4] + 17;
}

Suppose that x = 1454860 as in @LinusU's example so (1454860 / (SIZE_ALIGN=16) - 1 -1) / 128 - 4 = 706 that bigger than bin_tab array size 60.

I think that one possible solution for it is to add additional check in bin_index_up/bin_index functions for returning the last chunk for size bigger than MMAP_THRESHOLD. But there are some other changes is also needed.

@yurydelendik
Copy link
Author

add additional check in bin_index_up/bin_index functions

Notice that bin_index has this check, but bin_index_up has not. I guess it is because bin_index_up has only one user -- malloc.

@LinusU
Copy link

LinusU commented Oct 23, 2018

It seems like 2f9ae7b fixes the problems for me 😍

Will do some more testing tomorrow ☺️

edit: it fixes one problem (malloc out of bounds), but not all. I can still get webp to go out of bounds, and lodepng to get stuck in an infinite loop 😞 🤔

@mikevoronov
Copy link

mikevoronov commented Oct 24, 2018

@yurydelendik I was thinking about adding the new bin (65) specially for chucks that size is bigger than the old MMAP_THRESHOLD to unload the existing ones and decrease memory fragmentation. In this case fixes both of bin_index_up/bin_index is needed.

@mikevoronov
Copy link

mikevoronov commented Oct 24, 2018

@LinusU, I will investigate this new issue. Also it is interested why musl developers thrown out some settings from the initial coarse-graining dlmalloc (ftp://g.oswego.edu/pub/misc/malloc.c). F.e. HAVE_MMAP macros that can help us.

And can I ask you two questions:

  • what do you think about switching from dlmalloc to ptmalloc or jemalloc? These allocators are more optimised for multithreaded environment. Since the current tendency of WebAssembly evolution better multithreaded support by memory allocator could be very actual.
  • what do you think about switching to some special allocator for wasm target (like Rust wee_alloc)?

@mikevoronov
Copy link

mikevoronov commented Oct 24, 2018

@yurydelendik It seems that bin_index_up should return bin_idx + 1 (it seems that this why it ends by "up"). So I suppose that 64 should be returned from it but I hasn't tested webp example yet.

@LinusU
Copy link

LinusU commented Oct 24, 2018

I will investigate this new issue.

Awesome, really appreciate it ❤️

what do you think about switching to some special allocator for wasm target (like Rust wee_alloc)?

I would love it! I was actually looking into this, it's currently blocked on rustwasm/wee_alloc#9, but I think that that shouldn't be too hard to fix.

@LinusU
Copy link

LinusU commented Oct 24, 2018

So I suppose that 64 should be returned from it but I hasn't tested webp example yet.

I'll try it now 👍

@sbc100
Copy link
Collaborator

sbc100 commented Oct 24, 2018

musl is designed specifically for the linux kernel, so they can assume mmap() exists. I don't believe they are interested in porting to other kernels.

If pretty easy to drop in a replacement malloc at link time if you don't like dlmalloc, emscripten has a couple of different options. Perhaps it makes sense to do this kind of thing in https://github.com/WebAssembly/reference-sysroot?

@LinusU
Copy link

LinusU commented Oct 24, 2018

@michaelvoronov I tried both libwebp and lodepng with this (b6d6025) commit added, but still got the same errors as before (out of bounds access, and infinite loop)

@mikevoronov
Copy link

mikevoronov commented Apr 1, 2019

@LinusU @yurydelendik I have dug into musl's allocator and find out that:

  1. bin_index_up returns the next id of a bin that has chunk with sizes >= requested. Then this checks that this bin contains free chunks (binmap is bit map with 1 on i-th position indicates that i-th bin has free chunks). If so, free chunks will be returned (there is also some logic of trimming, coalescing and locking), otherwise extend_heap will be called.
  2. The largest bins contain chunks that size spread out in increasing powers of two.
  3. The 63-th bin (let's call it unsorted bin according to the ptmalloc way) contains the chunks that have size more than 0x1C00*SIZE_ALIGN. It means that it includes all big free chunks that must be mmaped in the original allocator version in unsorted order.

From these, it follows that bin_index_up should return some unique value for chunks size of that bigger than MMAP_THRESHOLD. Let it be 64. But then there are two problems:

  • according to the C standard logical left shift is undefined for additive more than 63
  • it needs to additionaly traverse all chunks from unsorted bin to find out the suitable one (with size >= than requested).

Finally, I have tested this approach there on libwep and its work! :)

@yurydelendik
Copy link
Author

I'll close this PR in favor of more effective solutions, e.g. using dlmalloc as in wasi-sysroot

@LinusU
Copy link

LinusU commented Apr 2, 2019

@michaelvoronov awesome work, it seems to be working perfectly! I've started to use it in @cwasm/webp here: LinusU/cwasm-webp@b6ec521

@yurydelendik as a current user of this repo, I would love for this to land ☺️, or does wasi-sysroot completely replace this project? It's very convenient to use this repo to compile c libraries for usage on the web and in Node.js 🙌

@jfbastien
Copy link
Owner

This repo was only ever intended as an experiment to show that WebAssembly could do modules, what it would look like, and to bring up the first libc for WebAssembly. I got it pretty far, but my hope was always that something like wasi would come along. This was a quick hack, and wasi isn't, so I think you want to migrate to it.

That being said, @sbc100 has been maintaining this project, and if he thinks it's worth keeping it going for a bit more I don't mind at all. I just don't want to impose extra maintenance burden on what I see as a historical project at this point in time.

@LinusU
Copy link

LinusU commented Apr 2, 2019

If anyone is interested, switching over to WASI SDK was super easy, here is the relevant commit: LinusU/cwasm-webp@52e33ef

Huge thanks to everybody working on this cool stuff ❤️ 🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants