-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive allocations in calculatePartialBinVolume slowing down fits #786
Comments
Hi, interesting observation! Yes, probably my commit that you linked there makes the There is still the manual memory allocation here of the Also that one was replaced with a So with 6.26, all of that should go down to almost zero. I could backport these improvements to 6.22 and 6.24, but we probably won't do a patch release just for that since it isn't even a bugfix but a performance improvement. I'd have a hard time to argue for that case, ROOT doesn't usually do patch releases for older releases after the next major release came out. Still, it's also not clear to me if it would end up in the CMSSW release.
For sure these improvements could be cherry-picked on top of 6.22. But how are you thinking to make this happen? With a patched ROOT version in CMSSW? In any case, I'd be happy to help doing whatever is necessary to speed up your workflows. |
With #776 and LCG102, indeed we spend a total of 30m building the asimov for the full run 2 combination vs. previously at least 6 hours (I gave up on the So let's call it one more thing to look forward to with ROOT 6.26 validation. I don't think it would be worth it at this point to try to port back, since it seems we are close to being able to use 6.26 directly. |
I have a vague recollection of finding this too (I think for HZZ, but not sure now), and concluding that we couldn't do much, short of writing a more efficient pdf ourselves. There is caching for a basic integral over the full histogram, but IIRC there's a norm over slices here and that doesn't have caching support. |
Interesting stack trace, there are some things I can understand from it. So you have this loop where you are iterating observable values and evaluate your PDF: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/102x/src/CachingNLL.cc#L342 It seems to me that this But actually this should not happen. The Digging further, these integrals are created here, and put in a What I would do is before adding these integrals to the Or do you somewhere disable RooFit's internal caching based on the objects states, via |
Also this stack trace seems to be from ROOT 6.24. Wasn't |
Even though it is faster in 6.26, we may still want to understand if this re-evaluation is necessary. It looks like the code in question was introduced in #680. Actually the RooCheapProduct we see taking so much time here is being constructed for the branch taken on L233 in
and its integral:
As best as I can tell, then the integrals get re-evaluated as the observable |
Okay then I have an idea. Even though you are having these conditional integrals, they still don't depend on any parameters in the Instead of returning this over-complicated integral object, we could return a new RooHistPdf where the underlying Would this make sense to you? I can also help to implement this, as I'm also interested in having such a optimization for |
The other thing I am puzzled about is why this was a non-issue in ROOT 6.12 (CMSSW 10_2_X).
and is a source of huge memory leak for long-running fits. I'm somewhat puzzled because it looks to be deleted properly in https://github.com/root-project/root/blob/v6-22-08/roofit/roofitcore/src/RooDataHist.cxx#L1478-L1483 |
Absolutely! I think this would also be an effective workaround for the memory leak as well, since the resulting RooHistPdf does not need to be summed at all. |
It's quite likely that the memory leak is related to the memory pool for RooArgSets. It's not the first time that it would be the cause of a mysterious memory leak. The thing is: RooFit caches normalization integrals where the keys are the memory addresses of the The "solution" originally implemented in RooFit was to overload operator With ROOT 6.14, the memory pool got refactored, and was improved to be more rigorous in remembering which addresses were already used (the original implementation only remembered the last N That means, from ROOT 6.14, memory will increasingly "leak" if you create RooArgSets on the heap, because the ranges of all addresses used so far are remembered by the pool. The same goes for the RooDataSet. With ROOT 6.28, this will be a no issue again, because this time I applied the right fix: RooFit was redesigned such that things are not cached by pointer addresses anymore. See also the 6.28 docs of RooArgSet ("Uniquely identifying RooArgSet objects"). The custom memory pool was therefore disabled. |
Okay! I will work on this. Feel free to assign the issue to me so it doesn't get lost over the Christmas break 👍 |
Unfortunately I haven't made progress on this yet, but just for the record there is also a ROOT Jira ticket from 2020 about this: https://sft.its.cern.ch/jira/browse/ROOT-10519 Maybe the insights in the discussion there are helpful for the when I fix the problem. |
The generation of the pseudo-asimov for unbinned channels (
toymcoptutils::SinglePdfGenInfo::generatePseudoAsimov
) can take a very long time (several hours) in initial fits. From profiling, most of the time is in what appear to be memory allocation calls inside RooFit code, in particularRooDataHist::calculatePartialBinVolume
:This is in the 112x-comb2022 branch with ROOT 6.22
I see that @guitargeek recently made some improvements to this section of the code in root-project/root@6120b9f
I wonder if any subset of that might be usable in 6.22. Meanwhile I will try with 6.26 soon to see if the performance is better there
NB: the flamegraphs were made using `perf:
and then imported into https://www.speedscope.app/
The text was updated successfully, but these errors were encountered: