You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #12633 (comment) it was mentioned by @pron that the main thing for libraries to worry about in playing nicely with Project Loom virtual threads, is keeping heavy objects in thread-locals.
However, many fast allocators (jemalloc, mimalloc, etc.) rely on thread-locals to speed up fast-path allocation, by drastically reducing the frequency of accesses to the shared, mutable, memory pool.
Fundamentally, it's the assumption that there would only be a "small" number of threads in the system - less than a few thousand - that Loom is up-ending. With Loom, there could be millions of threads.
In such a case it's not really economic for each thread to have a cache of megabytes, or perhaps even kilobytes, of memory.
Internally, it looks like the JDK is able to operate on carrier-thread specific thread-locals, but this functionality does not appear to be exposed publicly.
So I want to open a discussion on what we can do instead. What would a design for a fast allocator, that does not rely on thread-local storage, look like? Is there any prior art we can investigate? A few ideas come to mind:
Lock striping. Probably the most obvious one. Contention could still be an issue with any locking. Ideally we want the fast-path to be mostly-uncontended atomic ops.
Atomic pointer-bump allocation. The allocation part is easy, but deallocation is not. And fragmentation would likely be bad.
Lifetime hints. One problem allocators face is that they don't know if an allocation is going to be long-lived or short-lived. If we know that an allocation is short-lived, then the fragmentation of pointer-bump allocation would be temporary and we could tolerate it. We could have stripe access to slabs that retire when the pointer gets to their end, and return when the last allocation is freed. But we can only do that if we know that there will not be any long-lived allocation hitting them; otherwise such individual long-lived allocations could hold up entire slabs for potentially the lifetime of the process.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In #12633 (comment) it was mentioned by @pron that the main thing for libraries to worry about in playing nicely with Project Loom virtual threads, is keeping heavy objects in thread-locals.
However, many fast allocators (jemalloc, mimalloc, etc.) rely on thread-locals to speed up fast-path allocation, by drastically reducing the frequency of accesses to the shared, mutable, memory pool.
Fundamentally, it's the assumption that there would only be a "small" number of threads in the system - less than a few thousand - that Loom is up-ending. With Loom, there could be millions of threads.
In such a case it's not really economic for each thread to have a cache of megabytes, or perhaps even kilobytes, of memory.
Internally, it looks like the JDK is able to operate on carrier-thread specific thread-locals, but this functionality does not appear to be exposed publicly.
So I want to open a discussion on what we can do instead. What would a design for a fast allocator, that does not rely on thread-local storage, look like? Is there any prior art we can investigate? A few ideas come to mind:
Beta Was this translation helpful? Give feedback.
All reactions