New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization Performance Issue - with sample repo #1142
Comments
I generated a benchmark report for your sample project using The benchmark results indicate that the program scales effectively with concurrency when the sample size is 10. However, scalability issues arise as the sample size increases. I think this might be because larger sample sizes create big objects, possibly affecting the large object heap (LOH). Also, with a large sample size, like 4000, more memory is used as concurrency increases. This suggests a problem of memory inefficiency in multi-threaded environments. Base on the observations, my hypothesis is that this is related to the overhead of handling large objects by
Also, the SharedArrayPool has a
|
Thank you, Important to know: the sizing of the sample is selected by the inventors for (file)storage efficency reasons, not for memory or deserialization reasons - but anyway I'm using the same "Apple Silicon" like you, and it is frustrating to see that all the available power (that is fantastic) is not usable :-( So maybe, my next round in learnings and improvements should go into the array "efficent object creation".... BR Werner |
In the past I was in doubth if the root cause here may be in the area of vCPU, Hypterthreading, CPU Cache, CPU <=> Memory connection etc... A new experiment confirmed, that the issue is caused inside .net memory usage or inside the current process. What I did: I reconfigured my sample/work in a way that only one thread is used and it takes likely 80 seconds to do the work. I can see CPU usage > 90 %, I see faster execution of the overall work! Yes there is some overhead, but with a unoptimized duration of 80 seconds the overhead impact is not so much in %...
Also the CPU usage % are scaling in the same way... from 1 to 8 processes it looks like a perfect scaling, higher values are not as good but this is not my main concern ... My Hardware: MacBook Pro M2 (2023) with 12 Apple Silicon CPU's => Parallels VM for Windows 11 (ARM) with 10 CPUs for the VM) |
Ok, thanks. I'll try to have a look at what bottleneck we're hitting here.
…On Thu, 11 Apr 2024, 06:27 Werner Mairl, ***@***.***> wrote:
In the past I was in doubth if the root cause here may be in the area of
vCPU, Hypterthreading, CPU Cache, CPU <=> Memory connection etc...
A new experiment confirmed, that the issue is caused inside .net memory
usage or inside the current process.
What I did:
I reconfigured my sample/work in a way that only one thread is used and it
takes likely 80 seconds to do the work.
Then I splitted the work up in *multiple PROCESSES* (each with one
thread)... and see: it scales!!!
I can see CPU usage > 90 %, I see faster execution of the overall work!
Yes there is some overhead, but with a unoptimized duration of 80 seconds
the overhead impact is not so much in %...
ProcessesDuration (sec)
1 79.5
2 40.5
4 20.5
8 10.9
10 10.0
Also the CPU usage % are scaling in the same way...
from 1 to 8 processes it looks like a perfect scaling, higher values are
not as good but this is not my main concern ...
My Hardware: MacBook Pro M2 (2023) with 12 Apple Silicon CPU's =>
Parallels VM for Windows 11 (ARM) with 10 CPUs for the VM)
—
Reply to this email directly, view it on GitHub
<#1142 (comment)>
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMCHC4YE7JSY2XBMSVTY4YNLRBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVAZDAOBXHA4DAMUCUR2HS4DFUVUXG43VMWSXMYLMOVS2UMRSGA4DGOBQHAZTNJ3UOJUWOZ3FOKTGG4TFMF2GK>
.
You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>
.
|
Hello again ;-)
Yes, I'm aware of #669 and my personal conclusion there, but I like to investigate deeper, and maybe someone can help!
Issue: CPU seems not to be used on high core machines.
I have created a small (hopefully small enough) repository (net8.0) that provides a simple usecase with basic measuring as a playground for everyone!
The proto definition used is a real world scenario, coming from the OpenStreetMap pbf file format.
Part of the implementation is also "stolen" from the OsmSharp project (MIT licensed).
Sample Repository: https://github.com/WernerMairl/protobuf-net-concurrency
Expectations
Using 8 threads/tasks in parallel less then 2000 ms in duration overall should be possible (comparing with 4950 ms with one thread).
I cannot understand why we see a rate of 242 deserializations per seconds in a singlethread scenario and only 50 deserializations per second (and thread) in a 8-thread scenario.
Calculating some overhead, i would expect a rate around 180-200 for each of the 8 threads!
Questions
Any help is welcome to improve this.
The text was updated successfully, but these errors were encountered: