-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904
Conversation
cms-bot internal usage |
@cmsbuild please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44904/40174
|
A new Pull Request was created by @VinInn for master. It involves the following packages:
@mandrenguyen, @jfernan2 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
-1 Failed Tests: UnitTests RelVals RelVals-INPUT AddOn Unit TestsI found 6 errors in the following unit tests: ---> test testTauEmbeddingWorkflow2016postVFP had ERRORS ---> test testTauEmbeddingWorkflow2016preVFP had ERRORS ---> test testTauEmbeddingWorkflow2017 had ERRORS and more ... RelVals
Expand to see more relval errors ...RelVals-INPUT
Expand to see more relval errors ...AddOn Tests
Expand to see more addon errors ... |
is this from reordering the data members? |
wouldn't it be more practical to update What's the cost of now having more scattered data in memory? Is it negligible? It would be nice to see some profiler or at least the timing piechart. |
@slava77 , sorting indices and caching score. maybe yes. Was an option I considered. Not sure it is more practical. As said the code is now faster at least for HLT. In reco it crashes for reasons to be understood (not obvious at all: it seems that not everything has ben recompiled?) |
In part. also moving from short (int16) to uint8 (nhits is definetively less than 255. HitPattern is limited to 76) |
crash reason understood: comes from a test of TempTrajectory validity. |
For what I'm concerned this PR is final. |
I looked at the 1K events tracking comparisons in #44904 (comment) and had no comments, sorry for not being explicit. I'm not sure I've seen a profiling/timing result re my comment from May 5 |
I reported on time improvements in the (updated) first comment
I can post results from perf later on. |
thanks; apparently I was looking in time order and didn't check the PR description |
Do I understand correctly that it is closer to 12-15%, if the overall Is it fair to conclude also from looking at the unrelated modules that the grouped version is about the same if not slower? |
@slava77 I can try to run with more events to make initialization count less |
repeated the time measurements with 10K events and no GPU (one will notice now patatrack showing up) I would say that the speed improvement in limitedCandidate is obvious and traceable to the reduction of "heap" operations. |
+1
both examples were worse. This time the unrelated modules are not visible to guess which direction the baseline is moving. Is the test from HLT? |
this is offline: first column Release, second colum PR |
Fro what concern unrelated modules: |
totals after greping out The results are pretty stable; it looks like the 3rd significant digit varies (things often vary more). |
I leave it as exercise to the reader... |
For sustainability, I think it is better to keep the same implementation in both |
OK, only
fine thanks for the inputs. |
+hlt
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
Currenlty, in both (grouped)limitedCandidates, for each "step" candidates are first all collected and then the size is reduced based on some score.
This is equivalent to not "pushing" a new candidate if worse than the current worst once the collection has reached the max allowed size. (but for candidates with equal score).
The implementation has been therefore modified to this latter mechanism that is more performant.
It should also be noticed that the final sorting is not required and indeed in groupedLimitedCandidates there is NO final sort.
The Size of TempTrajectory has also been reduced to speed up move and swap.
Took the opportunity to reduce the size of Trajectory as well
and to cleanup in general CkfTrajectoryBuilder code.
In a HLT menu
time of limitedCandidates improves of 20%
while the one of groupedLimitedCandidates improves of a bit more than 10%
HLT throughput improves of a solid 1.5% at least.
Purely technical. Negligible regressions may appear in case of "candidates" with identical score.