-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager Decommits in Medium Sized Slabs #484
Comments
@SchrodingerZhu Thank you for open this issue. I would like to mention some details about my situation. I'm doing multi-threading video processing and typical allocations are 1080p 16bit frame buffers (or more specifically, buffers of frame planes; 4,147,200 bytes for each). There are also frame buffers of other byte lengths. In my application, these frame buffers are allocated in arbitrary worker threads and deallocated in one GC thread. Considering the performance of returning free spaces from GC thread to worker threads in deallocation, I have adopted snmalloc as the allocator of frame buffers. I first tried snmalloc1. Because the default config of chunk size of snmalloc is not big enough to deal with frame buffers, I have configured snmalloc to use 16MB large chunks. Under this configuration, one mediumslab can hold four 16bit 1080p frame buffers; it can hold fewer frame buffers with larger size. In current implementation of snmalloc, one mediumslab is deallocated once it becomes empty, and goes into the global large object stack with synchronization. Given that in GC, frame buffers are deallocated in batch, and one mediumslab can only hold four, it is expected that mediumslabs will frequently become empty and trigger large deallocation. This scenario is not unique to my application—compared with superslabs, mediumslabs can contain fewer objects because these objects are larger, so mediumslabs are easier to become empty in nature, if medium objects are alloc/dealloc frequently. I also tried snmalloc2. I have configured snmalloc to use 16MB large buddy. It decommits a 16MB block once it become empty, resulting in worse performance. My idea is that maybe we can add a freelist of mediumslabs (or large buddy blocks) with limited capacity. When a mediumslab becomes empty, it can first go into this local freelist instead of triggering immediate global large deallocation/decommits. |
I think (but @mjp41 is actually the authority here) that, for I can think of a few options that might be worth pursuing:
This is not meant to be an exclusive list, just food for thought. |
@nwf the pagemap is updated with the My plan was the use the code in |
BTW, I'm measuring the cost of decommits on Windows and trying to find alternatives. I tested snmalloc2 with 4 different PALs. Decommit static void notify_not_using(void* p, size_t size) noexcept
{
SNMALLOC_ASSERT(is_aligned_block<page_size>(p, size));
BOOL ok = VirtualFree(p, size, MEM_DECOMMIT);
if (!ok)
error("VirtualFree failed");
} No-op static void notify_not_using(void* p, size_t size) noexcept {} Reset static void notify_not_using(void* p, size_t size) noexcept
{
SNMALLOC_ASSERT(is_aligned_block<page_size>(p, size));
void* r = VirtualAlloc(p, size, MEM_RESET, PAGE_READWRITE);
if (r == nullptr)
error("VirtualAlloc failed");
} Offer & Reclaim static void notify_not_using(void* p, size_t size) noexcept
{
SNMALLOC_ASSERT(is_aligned_block<page_size>(p, size));
auto rv = OfferVirtualMemory(p, size, VmOfferPriorityNormal);
if (rv != ERROR_SUCCESS)
error("OfferVirtualMemory failed");
}
template<ZeroMem zero_mem>
static void notify_using(void* p, size_t size) noexcept
{
SNMALLOC_ASSERT(
is_aligned_block<page_size>(p, size) || (zero_mem == NoZero));
auto rv = ReclaimVirtualMemory(p, size);
if (rv == ERROR_SUCCESS || rv == ERROR_BUSY)
return;
void* r = VirtualAlloc(p, size, MEM_COMMIT, PAGE_READWRITE);
if (r == nullptr)
report_fatal_error(
"out of memory: {} ({}) could not be committed", p, size);
} I measured performance (in my application, frames processed per second) and page faults (10000 frames). The result is interesting:
I'm surprised that offer & reclaim has very bad performance. |
This is reported by one of my friends (@misakikasumi). In his application, he used snmalloc to manage 16bit 1080P frame buffers, which takes 1/4 of the medium sized slab. Based on his observation, this layout actually causes a very frequent de-committing pattern and introduces a bit slow down on Windows platform comparing with system allocators or mimalloc.
By design,
LocalCache
only stores free lists for small size classes. If it is possible to introduce some additional bounded free lists for relatively large objects (so such we can avoid eager de-commmitting)?@misakikasumi would you like to share more details?
The text was updated successfully, but these errors were encountered: