New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GC heap hard limit for 32 bit #101024
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @dotnet/gc |
@gbalykov, I am wondering why you would want a I am working on #100380 (forgive the misleading title) that enabled the computation of To make
I was thinking about just limiting my work to just 64 bits, but if there is a reason to, we can figure how to get the calculation right on 32 bit platforms as well. |
It's the same as for 64bit: to be able to limit memory consumption of process, and, specifically, GC heap size. Setting the limit specifically on managed heap size allows for much better granularity than using limits for whole process with cgroups, etc. Even though that this change doesn't set default value for containerized environments ( Also here's related discussion about heap limits for 32 bit: #78128.
Can you share more details about this? During my anylysis of code (f84d33c) I didn't find any other related places that depend on bitness, and it seems that with this change behavior on 64 and 32 bit is pretty much the same. For example, I was also able to test this change with more complex apps, and physical size of heap (Private_Dirty of GC vmas) always remained in allowed limit as expected (of course, assuming that heap is able to fit in that limit with more frequent GC, for example). |
It does make sense to use To start with, the key difference between 32 bits and 64 bits is Here is where
Historical Context (Good for knowledge, but feel free to skip on first read)With respect to commit accounting, let's go back in time and see why we are doing it. By the time we introduced Then I introduced per object heap hard limit (#36731) by checking against By that time, the verification logic was unnecessarily complicated, and therefore we didn't merge those in. That was before Much of the verification logic was written with only Keeping track of these number is non-trivial, at the very least, we need to take the Then it comes to A technical challenge at that point of time is that if One simple way to get around it is to always keep the counters, which is what I am trying to do right now, we have to be careful with respect to performance (which I haven't checked yet). I am hoping it is okay. The known problem - overlapped commitWhile it appears that it is easy to just add when we commit and subtract when we decommit, it is not that simple. The problem is overlapped commits. Under
Under non- The double count on it own is wrong, but is not fatal. With that happened repeatedly, however, we will leak those bytes (while memory is fine), and so we ran out of bytes when you try to commit (but in fact you do have memory). One of the known case of overlapping commit is for mark arrays. We don't keep track of the exact bound where the mark array was committed, so any time we need to ensure mark array is available and it isn't fully committed, we commit the whole mark array range every time we need it. And there could be other unknown cases that we just don't know in the moment. The unknown problem - overflow?Beyond just overlapped commit, I am seeing other problems as well. In case where a heap hard limit is not provided, I suspect the counter overflowed. When I decommit, I am hitting an assert that we are decommitted more memory than we have, and the counter is suscipiously low at that point. Of course, that's just a guess. There could be other things going on as well. The only thing that I am sure is the number must be wrong and we can't decommit memory that wasn't committed. Plan of attackFundamentally, it is a calculation error. Unfortunately, we don't have any symptoms when the wrong happens, and failing at the end is too late. To make this easier, I think we can 1.) Track mark array usage, and make sure we don't do overlapped commit. Notice that Final wordsAs you can see, it is not a trivial problem to solve. With efforts, I think we can nail it. It will worth the effort to make 32 bits app work better. |
Thanks for such detailed description!
Does this mean that heap hard limit doesn't work correctly with segments even on 64bit on current main? Original PR to add heap hard limit support (dotnet/coreclr#22180) was merged back in 2019, and it seems that GC regions support (#59283) was enabled only in 2022, so during my work on this patch I thought that segments on 64 bit fully support heap hard limit. If heap hard limit with segment is not fully correct on 64 bit on main now, is it safe to assume that it was correct before GC regions support was enabled by default (e.g. in .net 6)? Or is this possible leak in bytes accounting (e.g. for mark array) a known bug of heap hard limit with segments that was present since 2019?
If I understand correctly, this logic for counting committed bytes by walking the heap data structures is needed for two things:
So, if both are disabled for 64 bit, it seems it should be the same as it was before GC regions were enabled by default. Is it correct?
Can you share why is it needed to always keep counters on 64bit with regions if there's already a way to update them by walking the heap data structures? Is it needed for better validation of counters and heap structures? If there's no way to update GC counters by walking the heap data structures with segments, then always keeping the counters seems the only way to fully support limits on 32 bit (including verification of counters and Thanks again for your interest in this! |
Sorry for the late reply, I took the last few days trying to figure out these. > Does this mean that heap hard limit doesn't work correctly with segments even on 64bit on current main?I confirm this is the case, to prove that, I have found a couple of issues, I plan to get these fixed. Bug 1When we get a new segment by committing memory for large object heap or pinned object heap, it is incorrectly accounted for as Gen 2, this will impact the gc_heap::get_segment (size_t size, gc_oh_num oh)
{
...
result = make_heap_segment ((uint8_t*)mem, size, __this, (uoh_p ? max_generation : 0));
...
} Bug 2When we release a segment, we used void gc_heap::virtual_free (void* add, size_t allocated_size, heap_segment* sg)
{
bool release_succeeded_p = GCToOSInterface::VirtualRelease (add, allocated_size);
if (release_succeeded_p)
{
reserved_memory -= allocated_size;
dprintf (2, ("Virtual Free size %zd: [%zx, %zx[",
allocated_size, (size_t)add, (size_t)((uint8_t*)add + allocated_size)));
}
} But when we call it in This will impact both > If heap hard limit with segment is not fully correct on 64 bit on main now, is it safe to assume that it was correct before GC regions support was enabled by default (e.g. in .net 6)?With the bugs I just shown above, It is pretty clear the answer is no. The bugs I have found above has nothing to do with > Can you share why is it needed to always keep counters on 64bit with regions if there's already a way to update them by walking the heap data structures? Is it needed for better validation of counters and heap structures?When I built it, it was meant to be a stepping stone. As you probably know, the logic is quite complicated and easy to make mistakes. Building this logic helped me a lot in validating that I am doing the right thing. With the PR, my goal was to get Except my investigations above proved otherwise, the bug is actually quite bad, the > Can you share the C# program that you use for testing?It is this failing test case in #100380. With that test case, I can:
Unfortunately, getting these committed counters correct for segments isn't a priority right now. It would be great if you could help with the debugging. Essentially, you want to find where in the code that is probably
I know, it is easier said than done, it took me quite a few days to find the couple of issues above. Let me know if there are things I could help with. I am more than happy to help you with this. |
With another week of debugging, I figured the bug related to server GC. It isn't related to threading, it is just that I missed one call site to With that, we can focus on the |
@cshung Great work! I've been on vacation and busy with other task right now, but plan to switch back to this starting from the next week. |
@gbalykov, welcome back from your vacation, hope you had a good time. Another update, I have got the bookkeeping numbers as well (sadly I gave up the mark array bytes for simplicity). With that, I have gone through a round of review with @Maoni0 with my change. I think the change is more or less good to go and I will merge it whenever I can. It would be great if you can rebase this change on top of that, test it on some 32 bit platforms and see if that still works? To stress test the calculation, you can turn on the |
This change enables heap hard limit on 32 bit.
Approach is similar to
DOTNET_GCSegmentSize
(GCConfig::GetSegmentSize
), which allows to set size of segment for SOH/LOH/POH, and guarantees that there's no oveflow during computations (for example, duringsize_t initial_heap_size = soh_segment_size + loh_segment_size + poh_segment_size;
). WhenDOTNET_GCSegmentSize
is set on 32bit, it's rounded down to power of 2, so largest possible value of provided segment (SOH) is 2 Gb (4Mb<=soh_segment_size<=2Gb
). For LOH/POH same value is divided by 2 and then also rounded down to power of 2, so largest possible value of LOH/POH segment is 1 Gb (4Mb<=loh_segment_size<=1Gb, 0<=poh_segment_size<=1Gb
). So, segment size for SOH/LOH/POH never overflows, as well asinitial_heap_size
(except for oveflow ofinitial_heap_size
to 0, which will lead to failed allocation later anyway).Similar thing happens when
DOTNET_GCHeapHardLimit
orDOTNET_GCHeapHardLimitSOH
/DOTNET_GCHeapHardLimitLOH
/DOTNET_GCHeapHardLimitPOH
are set on 32bit. There're limits on these values:0 <= (heap_hard_limit = heap_hard_limit_oh[soh] + heap_hard_limit_oh[loh] + heap_hard_limit_oh[poh]) < 4Gb
a) 0 <= heap_hard_limit_oh[soh] < 2Gb, 0 <= heap_hard_limit_oh[loh] <= 1Gb, 0 <= heap_hard_limit_oh[poh] <= 1Gb
b) 0 <= heap_hard_limit_oh[soh] <= 1Gb, 0 <= heap_hard_limit_oh[loh] < 2Gb, 0 <= heap_hard_limit_oh[poh] <= 1Gb
c) 0 <= heap_hard_limit_oh[soh] <= 1Gb, 0 <= heap_hard_limit_oh[loh] <= 1Gb, 0 <= heap_hard_limit_oh[poh] < 2Gb
0 <= heap_hard_limit <= 1Gb
These ranges guarantee that calculation of soh_segment_size, loh_segment_size and poh_segment_size with alignment and round up won't overflow, as well as calculation of sum of them for allocation (overflow to 0 is allowed, same as for
DOTNET_GCSegmentSize
). When values specified by user with env variables don't meet the requirements above, runtime exits withCLR_E_GC_BAD_HARD_LIMIT
. When allocation (withmmap
on Linux) fails, runtime exits with same error as for large segment size specified withDOTNET_GCSegmentSize
.This patch doesn't enable heap hard limit on 32bit in containers (
is_restricted_physical_mem
), because current heap hard limit approach is to reserve one large GC heap segment with size equal to specified heap hard limit, and no new segments are reserved in future. On 64 bit in containers by default heap hard limit is set to 75% of total physical memory available, and this is fine, because virtual address space on 64 bit is much larger than actual physical size on devices. In contrast, on 32 bit virtual address space size might be the same as available physical size on device (e.g. 4 Gb for both). This means that reserving 75% of total physical mem will reserve 75% of whole virtual address space, which might be both undesirable (e.g. process later expects more available memory) andmmap
on Linux will probably fail anyway.I've ran release CLR tests on armel Linux with this patch on top of f84d33 with different limits, there seems to be no issues with 4Mb/8Mb/16Mb and 512Mb heap hard limits (some tests fail with "Out of memory"/"OOM" and "System.OutOfMemoryException", as expected, or "System.InvalidOperationException: NoGCRegion mode must be set" when NoGC region is larger than limit). Same for GCStresss 0xc and 0x3 with 4 Mb heap hard limit, and same for debug CLR tests on armel Linux with 4Mb heap hard limit.
@Maoni0 @cshung please share what you think. Thank you.