Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh has a compile-time fixed max arena size of 64 GB #37

Open
asl opened this issue Mar 10, 2019 · 23 comments
Open

Mesh has a compile-time fixed max arena size of 64 GB #37

asl opened this issue Mar 10, 2019 · 23 comments

Comments

@asl
Copy link

asl commented Mar 10, 2019

Hello

I'm trying to benchmark SPAdes (https://github.com/ablab/spades) with Mesh. Currently SPAdes runs fine with both tcmalloc and jemalloc (and have embedded jemalloc for the sake of completeness). On reasonable small dataset (with memory consumption around ~30Gb) I'm seeing:

src/meshable_arena.cc:448:void mesh::MeshableArena::beginMesh(void*, void*, size_t): ASSERTION 'r == 0' FAILED:

Some quick debugging revealed that mesh tried to mprotect lots of pages of 4k each. As a result mprotect() at some point returns ENOMEM. On my system I'm having:

$ sysctl vm.max_map_count 
vm.max_map_count = 65530

If I'd increase vm.max_map_count to 655300 (I'm lucky, I'm having sudo access and the majority of users don't) then the assertion goes away and just std::bad_alloc is thrown. Here is MALLOCSTATS=1 report just in case:

Meshed pages HWM:   297048
Meshed MB HWM:      1160.3
MH Alloc Count:     376537                                                                                                                                                      MH Free  Count:     629688
MH High Water Mark: 676778

But for me it looks there is some huge design flaw somewhere as the # of memory mappings is a limited resource and one simply cannot mmap / mprotect each page.

@bpowers
Copy link
Member

bpowers commented Mar 10, 2019

Hi @asl!

I think there are ~ 3 things going on here. First, is that there appears to be a good amount of memory Mesh can reclaim on SPades (over a GB), neat!

Second is that you're right; we're hitting limits around vm.max_map_count. This is a little confusing and a bug -- we try to explicitly avoid hitting this limit, but I think our existing code to do so is too naiive. On startup, Runtime::initMaxMapCount() looks at /proc/sys/vm/max_map_count and sets a limit on the number of meshes based on max_map_count. The comment for kMeshesPerMap says:

// if we have, e.g. a kernel-imposed max_map_count of 2^16 (65k) we
// can only safely have about 30k meshes before we are at risk of
// hitting the max_map_count limit.
static constexpr double kMeshesPerMap = .457;

BUT, we only check that at the start of GlobalHeap::meshAllSizeClasses() -- if we find too many spans to mesh we could end up in the Danger Zone. I've opened up #38 to specifically track this.

Third, we allocate our (sparse) arena on program startup, which gives us a lot of simplicity. When things were much earlier in development, the Ubuntu system I was on had trouble with coredumping the arena - it seemed to insist on filling the (mostly empty) virtual mapping of the arena with zeros on a crash.

Before Friday, the arena size was 8 GB, which is too small for your ~ 30 GB working set. I've increased the arena to 64 GB in the latest commit to master - let me know if this enables SPades to run correctly. I've opened #39 to track this specific issue.

@asl
Copy link
Author

asl commented Mar 11, 2019

Mesh can reclaim on SPades (over a GB), neat!

How I can see it? Is this "Meshed MB HWM" value?

/proc/sys/vm/max_map_count and sets a limit on the number of meshes based on max_map_count

Oh, well.. This does not smell good :) SPAdes uses (file) memory maps here and there. Though typically it's something around 10 * # thread, so should be below 1000 for almost any sane system.

Before Friday, the arena size was 8 GB, which is too small for your ~ 30 GB working set. I've
increased the arena to 64 GB in the latest commit to master - let me know if this enables SPades
to run correctly. I've opened #39 to track this specific issue.

I believe it's quite important not have some hard-coded arena size. We could easily utilize, say, 1Tb of RAM in hard cases ;) The actual working set should be around ~60 Gb for this particular dataset iirc. Sadly, now mesh just fails to allocate anything and and therefore throws std::bad_alloc.

@asl
Copy link
Author

asl commented Mar 11, 2019

More information about that std::bad_alloc – it seems mesh failed to fulfill request to allocate 28 Gb as one piece.

And indeed, in

void *GlobalHeap::malloc(size_t sz) {

We're seeing that mesh unable to allocate more than 2Gb of memory in a single chunk:

if (unlikely(pageCount * kPageSize > INT_MAX)) {
    return nullptr;
 }

Really? :)

I opened #40 to track this issue

@bpowers
Copy link
Member

bpowers commented Mar 11, 2019

hah, yeah... Thanks for the separate tracking issue :)

@bpowers
Copy link
Member

bpowers commented Mar 11, 2019

And agreed on not requiring a fixed max; it is just that having a single range of virtual address space greatly simplifies the implementation. I know Go does (or used to do) a similar thing. This LWN article seems to describe this exact problem: https://lwn.net/Articles/428100/

@brano543
Copy link

Hey there. Are there any future plans for fixing this issue? This project has great potential and I can see that a lot of effort has been put into it to solve this "fragmentation" nightmare we all battle against in long-running server jobs. Unfortunately this issue seems to me like a show breaker why one can't use this library.

Could you also explain why that arena size can't be set to "INFINITY" (2^64) ? I mean why it is even needed to have constraint on size? I am not sure I get it why does Ubuntu try to dump memory which wasn't ever malloc'd or was freed.

@emeryberger
Copy link
Member

bump @bpowers

@bpowers
Copy link
Member

bpowers commented Sep 23, 2019

@brano543 can you describe what actual issues you are running into? The max heap size is now 64 GB, "which ought to be enough for anybody". Please let us know if you are running into issues with this in practice and we can prioritize working on it, but please don't not try mesh because of perceived limitations.

There are two main reasons we didn't set it larger from the get-go -- the first is that some tools (like the crash reporting software on Ubuntu) choked on very large virtual memory mappings. The behavior we were seeing was: Mesh would allocate 64 GB of virtual address space. A program would allocate a few hundred MB and then crash. The core dump parser wouldn't be smart enough to understand 63.5 GB of that virtual address space wasn't ever allocated or backed by real RAM, and would try to create a core dump file for sending to Ubuntu filled with 63.5 GB worth of 0s. There is a madvise flag MADV_DONTDUMP that should help with this, but at the time, I had trouble integrating this in a way that didn't hurt performance.

The second reason is that we have some ancillary data structures we allocate (like lookup tables) that depend on the size of the arena. I think this is a smaller issue, as they will "just" use up some extra virtual address space.

@bpowers bpowers changed the title Maximum # of mmapped ranges quickly exhausted Mesh has a compile-time fixed max arena size of 64 GB Sep 23, 2019
@asl
Copy link
Author

asl commented Sep 23, 2019

Thanks for clarification. This effectively closes Mesh for SPAdes as we're routinely allocating more than 64 Gb of RAM. Apparently no other memory allocator we are aware of has such a limitation.

@emeryberger
Copy link
Member

How much memory do you allocate? I feel like this is something that could be made a build-time parameter.

@asl
Copy link
Author

asl commented Sep 23, 2019

As much as necessary. Could allocate 0.5 Tb, could allocate 1 Tb. It depends on the input.

@emeryberger
Copy link
Member

And to be clear, you mean that the actual physical footprint of the app in RAM is ~ 1TB, correct?

@asl
Copy link
Author

asl commented Sep 23, 2019

It might be 100 Mb, it might be 4 Gb, it might consume 1 Tb. Everything depends on the input.

@bpowers
Copy link
Member

bpowers commented Sep 23, 2019

@asl if you increase this constant here: https://github.com/plasma-umass/Mesh/blob/master/src/common.h#L104 from 64 to 2000, that should bump the max heap up to 2 TB. I would be eager to hear how this works for you! If things seem to work fine, I can do some testing on some much smaller systems, and see about making that the default.

@bpowers
Copy link
Member

bpowers commented Sep 23, 2019

I'll also talk to @emeryberger - my intuition is that having a single, non-growable heap makes parts of the implementation significantly easier, but maybe I'm overthinking it.

@asl
Copy link
Author

asl commented Oct 22, 2019

Well, for us we'd need something like a run-time constant then. E.g. the user could specify max amount of memory he / she could use.

@asl
Copy link
Author

asl commented Oct 22, 2019

@bpowers So, I tried again on small dataset (with expected memory consumption less than 10 Gb). Unfortunately, I had to lower kMeshesPerMap down to 0.1, otherwise it did not work. So, I guess #38 is really a blocker.

@bigerl
Copy link

bigerl commented Oct 19, 2020

I also ran into this:

Mesh: arena exhausted: current arena size is 64 GB; recompile with larger arena size.

The system has 128 GB RAM and the application uses more-or-less 128GB RAM for the given input.
I would also like to test it on another machine with up to 1TB if RAM (and use all of it).

I have three questions:

@dumblob
Copy link

dumblob commented Dec 16, 2021

For a few projects (actually implementations of programming languages for HPC etc.) I wanted to propose to use Mesh. But any such compile-time limitation puts me off. It's simply impossible to use - many ordinary desktop systems have more than 64GByte ram nowadays. Ordinary servers typically several terabytes and special systems tens or even small hundreds of terabytes.

Any plans on removing this limitation or making it at least a run-time setting?

@asl
Copy link
Author

asl commented Dec 16, 2021

@dumblob The project seems to be abandoned. We (SPAdes) are using mimalloc now.

@dumblob
Copy link

dumblob commented Dec 16, 2021

@asl aha, ok. That's a pity anyway.

Btw. actually I wanted to test Mesh against mimalloc and I'm pleased to hear mimalloc serves your purpose well (my experience is also basically only very positive with mimalloc).

@emeryberger
Copy link
Member

The lead PhD student on this project, @bpowers , has moved to industry and recently had a child, so he has been otherwise quite occupied :).

In any event, this particular issue got lost; sorry about that.

@dumblob
Copy link

dumblob commented Dec 17, 2021

I see - then I wish all the best to @bpowers et al.

Should this project get resurrected at some point, I'll try to keep an eye on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants