[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563

hexawyz · 2024-04-25T18:08:57Z

Description

This is a scenario that I've observed a few times with the service I've developped here: https://github.com/hexawyz/Exo/tree/master/Exo.Service

After some run time, the service, which is mostly idle, will start consuming the equivalent of a whole CPU Core.

NB: This service uses Workstation GC instead of Server GC in order to reduce the memory footprint.

If the process is attached to Visual Studio, before or after this happens, Diagnostic Tools will show the following:

GC events are generated continuously
GC events will be triggered even when there have been exactly zero changes on the managed heap (see screenshot below)
The service does not seem to be too much affected otherwise (Most visible symptom is the rise in CPU usage)
Also, the VS Tasks/Parallel Stacks view will look totally normal, unless I missed something obvious

This behavior generally occurs after a quite long time (~2 days of run time for example, which is what lead to the screenshot above), but I've also observed it in a shorter session where it was started in Debug mode from within Visual Studio. In total, I believe I've observed this behavior about 4 times, so while the trigger seems to be somewhat random, the behavior is not.

The service is realively complex, and I don't excldue the possibility that this quirk in the GC could be triggered by some erroneous code in the service (I hope not). However, because I have observed this specific behavior multiple times, it seems not entirely random, so I believe it is worth asking for help here.

What I'm mainly asking here is if you would be aware of a situation that could trigger this behavior in the GC.
And also, what could I do to investigate this more in depth?

Reproduction Steps

This is how I reproduce the problem, but I don't ask that you run this, as it is not deterministic and the service might not work exactly the same on all computers.

Run the service
Wait / Run some requests on the service using associated UI

Expected behavior

The service would stay at close to 0% CPU usage all the time, and there would not be a continuous stream of GC events

Actual behavior

At some point, the service will start consuming a lot of CPU, and it will be show as a continuous stream of GC events by VS diagnostic tools.

Regression?

No response

Known Workarounds

(Restart my service, but that's not very helpful)

Configuration

.NET SDK: 8.0.104
OS: Windows 11 23H2
Architecture: x64
GC: Workstation

Other information

The service makes use of a lot of low level or advanced features such as native function pointers, GCHandle, WeakReference, DependentHandle, AssemblyLoadContext, etc. I don't believe using any of those should lead to the problem I observed in any case, but that might be worth knowing.
Also, I strongly suspect that the instabilities could be triggered when the associated UI is interacting with the service (through named pipes), as the service is otherwise doing nothing most of the time. (Or at least, doing nothing as much as possible)
However, even if this happens to be the case, the trigger remains non deterministic.

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-04-25T18:20:21Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

mangod9 · 2024-04-25T18:28:43Z

would you be able to collect a GCCollectOnly trace or a dump when the issue repros? Have you noticed this only after upgrading to 8?

hexawyz · 2024-04-25T19:28:31Z

Oh yes sure, I'll try to do that next time I observe this behavior

Regarding versions of .NET, I have no real point of comparison to tell if this behavior would have happened earlier. (While the code predates .NET 8.0, the app was not in a state to be really useful prior to a few months ago)

Maoni0 · 2024-05-03T04:14:44Z

I was going to suggest that you should get a GCCollectOnly trace with GCTriggeredStacks so you can actually see callstacks that triggered GCs. but I just found that it wasn't collecting the rundown events (which means you can't actually decode the managed call frames). so I just submitted a fix: 2031. obviously not available in perfview downloads yet. so I shared a private build at https://github.com/Maoni0/clrgc/tree/main/PerfView-bins if you like to try before the official release build is available.

PerfView.exe /GCCollectOnly /GCTriggeredStacks /nogui collect gctriggers.etl

dotnet-policy-service · 2024-05-03T18:56:10Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

hexawyz · 2024-05-07T19:34:10Z

Thanks for your suggestion of using perfview. I managed to reproduce the problem today and found that all GC events were caused by AllocSmall. After digging up by reattaching to VS, I'm pretty confident I found the problem, which was obviously in my code.

So, what can cause this pattern is apparently an infinite loop that generates small allocations. (The infinite loop was one bug; the allocations were another one)
As it seems, the allocations were quite small and extremely short lived, so the GC was just extremely efficient in doing its work. As such, I guess the mystery is solved 😅

Sorry for consuming some of your time, but thanks a lot for the help ! 😊

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 25, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2024

hexawyz changed the title ~~[Question/Bug] Endless stream of GC events occupying a CPU core: What coule be happening ?~~ [Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? Apr 25, 2024

vcsjones added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 25, 2024

Maoni0 added tenet-performance Performance related issue and removed tenet-performance Performance related issue area-GC-coreclr untriaged New issue has not been triaged by the area owner labels May 3, 2024

jeffschwMSFT added the area-GC-coreclr label May 3, 2024

hexawyz closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563

[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563

hexawyz commented Apr 25, 2024

dotnet-policy-service bot commented Apr 25, 2024

mangod9 commented Apr 25, 2024

hexawyz commented Apr 25, 2024

Maoni0 commented May 3, 2024 •

edited

dotnet-policy-service bot commented May 3, 2024

hexawyz commented May 7, 2024

[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563

[Question/Bug] Endless stream of GC events occupying a CPU core: What could be happening ? #101563

Comments

hexawyz commented Apr 25, 2024

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

dotnet-policy-service bot commented Apr 25, 2024

mangod9 commented Apr 25, 2024

hexawyz commented Apr 25, 2024

Maoni0 commented May 3, 2024 • edited

dotnet-policy-service bot commented May 3, 2024

hexawyz commented May 7, 2024

Maoni0 commented May 3, 2024 •

edited