-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory allocation alignment, and garbage collector compatibility #623
Comments
So the allocator's design currently has very strong alignment. Any allocation is guaranteed to be the aligned by the largest power of two that can divide the size. That is,
There is no way to do this. We also do not intend to add this. Freeing memory is something that can be done asynchronously, but allocating is not. If a call to free is delayed, then the program will use more memory. If a call to malloc is delayed, then the allocating thread will have to wait for the result. This means you would have one thread waiting on another to complete a task. This would not be great for performance. Why would you want this? The initial design was around supporting garbage collection, where a GC threads processed some garbage, and then the original allocating threads could then use the memory again. Hence, the mutator threads did all the allocation and processed the remote frees, and the GC threads found garbage, and returned it to the original mutator thread. |
To expand a bit on what @mjp41 says... While there's no explicit cross-thread allocation request (which, again, might have to block the requester until the requestee wakes up and handles the message), memory isn't necessarily doomed to be pinned to any one (thread's) allocator "frontend" forever. If an allocator determines that it has significant amounts of free address space, in the shape of completely empty slabs (~16KiB spans), it will pass that space back to a shared global pool managed by the "backend". One purpose of the remote messages is to handle inter-thread producer/consumer workloads, where, say, I allocate objects and hand them to you to process and free. If your workload doesn't otherwise allocate and mine doesn't otherwise free, then in a traditional thread-cache allocator, my cache is then constantly being refilled and yours is constantly being evicted, with neither really buying us much. The "return to home" nature of snmalloc's queues basically means that pipelines (or even DAGlines) have steady states that don't need to acquire global locks within the allocator. However, this does mean that the allocator frontend must periodically pause to process its message queue, and maybe that's sometimes not ideal. It should be easily possible, in terms of LoC added to or around There's also, I think, a fair bit of design space around the return of slabs to the backend. One could imagine tuning the current policy for particular workloads, enabling per-frontend policies, adding global information, &c. There's been some call for this in very address-space-constrained environments (#615), where disabling frontend caching and returning slabs to the shared pool immediately had some modest performance cost but, much more importantly, kept the system from OOMing. |
COBOL has a different concurrency model from the traditional multithreaded one. The language instead uses multiple separate processes running concurrently, exchanging data through message passing. This essentially guarantees at the language level that there will only be a single user thread per process. In this case it could be useful to implement a system where allocations are done by the single user thread, and deallocations are done by the GC thread. With the user thread borrowing memory from the GC, and returning it back to a deallocation queue that is monitored by the GC thread. This might improve the performance of an RC-based GC, by deferring deallocations to another thread. snmalloc was the only allocator I found with a similar idea to this.
Is there a way to guarantee it to always be 16-byte aligned? |
Interesting you want to separate the "frontend" allocator into two parts that additionally communicate via message passing. This would be a reasonable amount of work, but possible. I'm concerned we might incur an atomic instruction on the fast path of free to make this work.
The minimum allocation size is two pointers sized words, and on any 64 bit platform will be 16-byte aligned. snmalloc/src/snmalloc/ds/allocconfig.h Line 25 in 9d44660
Nothing will be aligned less than the minimum allocation size as allocations are rounded up to a multiple of that size. |
If you mean "at least 16-byte aligned", then that's already the case on most platforms, because the smallest size class supported has to be (at least) snmalloc/src/snmalloc/ds_core/bits.h Line 349 in 9d44660
snmalloc/src/snmalloc/ds/allocconfig.h Line 25 in 9d44660
If you mean that the allocator could source, say, a 64 byte object that was 16- but not 32-byte aligned, then "probably not". The design of snmalloc's heap is that large chunks of address space (~16KiB) are given a single size class, so object alignment is directly correlated with object size. |
You might like to take a look at the use of snmalloc in https://github.com/Microsoft/Verona-sandbox. This uses a model of a single parent process with many children, and a shared heap between the parent and children, with fast allocation and deallocation on the shared heap from both (including allocating in one and freeing in the other). We use this to build a higher-level message-passing model, where objects are allocated in the shared heaps and we just pass pointers, so we get one-copy I/O between parent and child (or zero-copy if we're not doing defensive copies). |
Hi, we've been looking for a memory allocator for our COBOL runtime's garbage collector, and had a few questions.
What is the memory alignment for the pointers returned by snmalloc?
Would it be possible to configure snmalloc in a way that a thread can request memory from another thread?
I found the message passing architecture really interesting, and thought I'd try using it in a concurrent reference counting garbage collector (running in a separate thread). I see that remote deallocations are possible with snmalloc, so it would be interesting if remote allocations through message passing were also possible.
That could allows some interesting scenarios where the garbage collector is running in its own thread, and user threads request memory from it and then return it back.
The text was updated successfully, but these errors were encountered: