Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904

VinInn · 2024-05-05T12:38:59Z

Currenlty, in both (grouped)limitedCandidates, for each "step" candidates are first all collected and then the size is reduced based on some score.
This is equivalent to not "pushing" a new candidate if worse than the current worst once the collection has reached the max allowed size. (but for candidates with equal score).
The implementation has been therefore modified to this latter mechanism that is more performant.

It should also be noticed that the final sorting is not required and indeed in groupedLimitedCandidates there is NO final sort.

The Size of TempTrajectory has also been reduced to speed up move and swap.
Took the opportunity to reduce the size of Trajectory as well
and to cleanup in general CkfTrajectoryBuilder code.

In a HLT menu
time of limitedCandidates improves of 20%
while the one of groupedLimitedCandidates improves of a bit more than 10%

HLT throughput improves of a solid 1.5% at least.

Purely technical. Negligible regressions may appear in case of "candidates" with identical score.

cmsbuild · 2024-05-05T12:39:20Z

cms-bot internal usage

VinInn · 2024-05-05T12:40:25Z

@cmsbuild please test

cmsbuild · 2024-05-05T12:49:06Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44904/40174

This PR adds an extra 28KB to repository

cmsbuild · 2024-05-05T12:49:29Z

A new Pull Request was created by @VinInn for master.

It involves the following packages:

DataFormats/TrackCandidate (reconstruction)
TrackingTools/PatternTools (reconstruction)

@mandrenguyen, @jfernan2 can you please review it and eventually sign? Thanks.
@HuguesBrun, @jhgoh, @felicepantaleo, @gpetruc, @abbiendi, @VinInn, @mtosi, @mmusich, @bellan, @dgulhan, @andrea21z, @GiacomoSguazzoni, @CeliaFernandez, @missirol, @Fedespring, @cericeci, @rovere, @trocino, @VourMa, @JanFSchulte this is something you requested to watch as well.
@antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

cmsbuild · 2024-05-05T14:29:34Z

-1

Failed Tests: UnitTests RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2dcb23/39240/summary.html
COMMIT: 9469803
CMSSW: CMSSW_14_1_X_2024-05-05-0000/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/44904/39240/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 6 errors in the following unit tests:

---> test testTauEmbeddingWorkflow2016postVFP had ERRORS
---> test testTauEmbeddingWorkflow2016preVFP had ERRORS
---> test testTauEmbeddingWorkflow2017 had ERRORS
and more ...

RelVals

4.53A fatal system signal has occurred: segmentation violation
139.001A fatal system signal has occurred: segmentation violation
140.023A fatal system signal has occurred: segmentation violation

Expand to see more relval errors ...

RelVals-INPUT

4.64.6_MinimumBias2010A/step2_MinimumBias2010A.log
138.4138.4_PromptCollisions2021/step2_PromptCollisions2021.log
138.5138.5_ExpressCollisions2021/step2_ExpressCollisions2021.log

Expand to see more relval errors ...

AddOn Tests

hlt_mc_Fake

A fatal system signal has occurred: segmentation violation

hlt_mc_Fake1

A fatal system signal has occurred: segmentation violation

hlt_mc_Fake2

A fatal system signal has occurred: segmentation violation

Expand to see more addon errors ...

slava77 · 2024-05-05T14:41:17Z

Took the opportunity to reduce the size of Trajectory as well.

is this from reordering the data members?

slava77 · 2024-05-05T14:44:36Z

Sorting of TempTrajectory in CkfTrajectoryBuilder::limitedCandidates takes almost 1% of HLT time.
Reducing it's size to two pointers this timing is reduced by half.

wouldn't it be more practical to update CkfTrajectoryBuilder::limitedCandidates to sort indices to trajectories (or did I misunderstand the comment/code)?

What's the cost of now having more scattered data in memory? Is it negligible? It would be nice to see some profiler or at least the timing piechart.

VinInn · 2024-05-05T14:53:50Z

@slava77 , sorting indices and caching score. maybe yes. Was an option I considered. Not sure it is more practical.
The data are not really much more scattered in memory, there is just one more (fast) indirection.

As said the code is now faster at least for HLT. In reco it crashes for reasons to be understood (not obvious at all: it seems that not everything has ben recompiled?)
will now try some relval.

VinInn · 2024-05-05T14:55:10Z

Took the opportunity to reduce the size of Trajectory as well.

is this from reordering the data members?

In part. also moving from short (int16) to uint8 (nhits is definetively less than 255. HitPattern is limited to 76)

VinInn · 2024-05-05T17:30:41Z

crash reason understood: comes from a test of TempTrajectory validity.
A clever fix did not work. Will try another or just revert to a full default constructor...

VinInn · 2024-05-15T12:49:11Z

For what I'm concerned this PR is final.
Does TRK-POG or Reco have more questions?
(BTW I do not see them in the list of people notified....)

slava77 · 2024-05-15T13:08:14Z

Does TRK-POG or Reco have more questions?

I looked at the 1K events tracking comparisons in #44904 (comment) and had no comments, sorry for not being explicit.

I'm not sure I've seen a profiling/timing result re my comment from May 5

VinInn · 2024-05-15T13:22:06Z

I reported on time improvements in the (updated) first comment

In a HLT menu
time of limitedCandidates improves of 20%
while the one of groupedLimitedCandidates improves of a bit more than 10%

HLT throughput improves of a solid 1.5% at least.

I can post results from perf later on.

slava77 · 2024-05-15T13:36:48Z

I'm not sure I've seen a profiling/timing result

I reported on time improvements in the (updated) first comment

thanks; apparently I was looking in time order and didn't check the PR description

VinInn · 2024-05-15T14:24:01Z

this is the perf report for the release

and this other one for this PR

the 20% improvement for limitedCandidate is pretty evident
for the Grouped version is a 5%

slava77 · 2024-05-15T15:29:03Z

the 20% improvement for limitedCandidate is pretty evident

Do I understand correctly that it is closer to 12-15%, if the overall 5-7% decrease in unrelated modules like muonId or seedCreator are considered?

Is it fair to conclude also from looking at the unrelated modules that the grouped version is about the same if not slower?

VinInn · 2024-05-15T16:29:06Z

@slava77 I can try to run with more events to make initialization count less

VinInn · 2024-05-17T10:34:03Z

repeated the time measurements with 10K events and no GPU (one will notice now patatrack showing up)

This is the Release

and this the PR

I would say that the speed improvement in limitedCandidate is obvious and traceable to the reduction of "heap" operations.
In the grouped version the difference goes "in the noise".

slava77 · 2024-05-17T12:48:46Z

I would say that the speed improvement in limitedCandidate is obvious and traceable to the reduction of "heap" operations.

+1

In the grouped version the difference goes "in the noise".

both examples were worse. This time the unrelated modules are not visible to guess which direction the baseline is moving.
I don't see the *_heap operations in the baseline in the grouped, while it shows up at the bottom with the PR.

Is the test from HLT?
IIUC, the offline has a much larger weight in the grouped (even more so in a no-mkFit workflow/era). Perhaps a test there will be more convincing that the cost is at least not increasing.

VinInn · 2024-05-17T12:59:33Z

this is offline: first column Release, second colum PR
[innocent@patatrack02 13034.0_TTbar_14TeV+2024PU]$ paste TimeOriTraj.txt TimeNewTraj.txt | awk '{print $1, $2, $4}'
convTrackCandidates 0.207506 0.206641
conversionTrackCandidates 0.157866 0.144111
cosmicsVetoTrackCandidates 0.001399 0.001399
detachedQuadStepTrackCandidates 0.001174 0.001139
detachedQuadStepTrackCandidatesMkFit 0.012637 0.012584
detachedQuadStepTrackCandidatesMkFitSeeds 0.000278 0.000279
detachedTripletStepTrackCandidates 0.007312 0.007274
detachedTripletStepTrackCandidatesMkFit 0.340068 0.340722
detachedTripletStepTrackCandidatesMkFitSeeds 0.003492 0.003338
displacedRegionalStepTrackCandidates 0.181286 0.180114
duplicateDisplacedTrackCandidates 0.000020 0.000020
duplicateTrackCandidates 0.104824 0.105132
electronCkfTrackCandidates 0.011372 0.011065
highPtTripletStepTrackCandidates 0.035885 0.035877
highPtTripletStepTrackCandidatesMkFit 0.096393 0.096400
highPtTripletStepTrackCandidatesMkFitSeeds 0.001586 0.001528
initialStepTrackCandidates 0.006030 0.006038
initialStepTrackCandidatesMkFit 0.087755 0.087670
initialStepTrackCandidatesMkFitPreSplitting 0.092760 0.092999
initialStepTrackCandidatesMkFitSeeds 0.001295 0.001252
initialStepTrackCandidatesMkFitSeedsPreSplitting 0.001381 0.001337
initialStepTrackCandidatesPreSplitting 0.006307 0.006293
jetCoreRegionalStepBarrelTrackCandidates 0.054636 0.054550
jetCoreRegionalStepEndcapTrackCandidates 0.092552 0.091719
lowPtGsfEleCkfTrackCandidates 0.009014 0.008760
lowPtQuadStepTrackCandidates 0.704589 0.700201
lowPtTripletStepTrackCandidates 0.554903 0.552126
mixedTripletStepTrackCandidates 0.032825 0.032708
muonSeededTrackCandidatesInOut 0.003372 0.003375
muonSeededTrackCandidatesOutIn 0.000696 0.000695
muonSeededTrackCandidatesOutInDisplaced 0.000559 0.000560
pixelLessStepTrackCandidates 0.011386 0.010974
pixelLessStepTrackCandidatesMkFit 0.706225 0.707222
pixelLessStepTrackCandidatesMkFitSeeds 0.021475 0.020879
pixelPairStepTrackCandidates 0.152920 0.152784
tobTecStepTrackCandidates 0.502028 0.503019
uncleanedOnlyConversionTrackCandidates 0.002595 0.002350
uncleanedOnlyElectronCkfTrackCandidates 0.000094 0.000091

VinInn · 2024-05-17T13:02:03Z

Fro what concern unrelated modules:
there are two from Pixels, plus all the functions below AdvanceOneLayer such as gropuedMeasurements.
It's not easy to get stabler results

slava77 · 2024-05-17T13:06:51Z

this is offline: first column Release, second colum PR

totals after greping out initialStep\|detached\|highPt\|pixelLess and the rest would be more clear.

The results are pretty stable; it looks like the 3rd significant digit varies (things often vary more).

VinInn · 2024-05-17T13:45:48Z

I leave it as exercise to the reader...
Do you agree that is not worse

VinInn · 2024-05-17T13:48:27Z

For sustainability, I think it is better to keep the same implementation in both

slava77 · 2024-05-17T14:59:13Z

I leave it as exercise to the reader... Do you agree that is not worse

For sustainability, I think it is better to keep the same implementation in both

fine

thanks for the inputs.
+1

mmusich · 2024-05-21T06:52:53Z

+hlt

based on Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904 (comment),
not clear though why the HLT signature was requested here.

jfernan2 · 2024-05-22T07:10:30Z

+1

cmsbuild · 2024-05-22T07:10:52Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

rappoccio · 2024-05-22T15:36:45Z

+1

VinInn added 3 commits May 2, 2024 19:23

make TempTrajectory smaller and faster to move

a5548d1

Merged SmallTempTrajectory from repository VinInn with cms-merge-topic

4476fed

reduce size of Traj as well

9469803

cmsbuild added this to the CMSSW_14_1_X milestone May 5, 2024

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels May 5, 2024

cmsbuild added tests-started and removed tests-pending labels May 5, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels May 5, 2024

cmsbuild added tests-rejected and removed tests-started labels May 5, 2024

Merged SmallTempTrajectory from repository VinInn with cms-merge-topic

c675f23

VinInn added 2 commits May 6, 2024 14:18

remove unused instances of TempTrajectory

3fbdd16

Merged SmallTempTrajectory from repository VinInn with cms-merge-topic

9d91c97

cmsbuild removed the tests-rejected label May 6, 2024

VinInn changed the title ~~Speedup handling of (Temp)Trajectory by reducing their size~~ Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number's limitation. May 14, 2024

VinInn changed the title ~~Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number's limitation.~~ Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. May 14, 2024

cmsbuild added hlt-approved and removed hlt-pending labels May 21, 2024

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels May 22, 2024

cmsbuild added orp-approved and removed orp-pending labels May 22, 2024

cmsbuild merged commit 5132df0 into cms-sw:master May 22, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904

Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904

VinInn commented May 5, 2024 •

edited

cmsbuild commented May 5, 2024 •

edited

VinInn commented May 5, 2024

cmsbuild commented May 5, 2024

cmsbuild commented May 5, 2024

cmsbuild commented May 5, 2024

slava77 commented May 5, 2024

slava77 commented May 5, 2024

VinInn commented May 5, 2024

VinInn commented May 5, 2024 •

edited

VinInn commented May 5, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

VinInn commented May 17, 2024

slava77 commented May 17, 2024

VinInn commented May 17, 2024

VinInn commented May 17, 2024 •

edited

slava77 commented May 17, 2024

VinInn commented May 17, 2024 •

edited

VinInn commented May 17, 2024

slava77 commented May 17, 2024

mmusich commented May 21, 2024

jfernan2 commented May 22, 2024

cmsbuild commented May 22, 2024

rappoccio commented May 22, 2024

Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904

Speedup (grouped)limitedCandidates by changing the implementation of the candidates' number limitation. #44904

Conversation

VinInn commented May 5, 2024 • edited

cmsbuild commented May 5, 2024 • edited

VinInn commented May 5, 2024

cmsbuild commented May 5, 2024

cmsbuild commented May 5, 2024

cmsbuild commented May 5, 2024

Unit Tests

RelVals

RelVals-INPUT

AddOn Tests

slava77 commented May 5, 2024

slava77 commented May 5, 2024

VinInn commented May 5, 2024

VinInn commented May 5, 2024 • edited

VinInn commented May 5, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

slava77 commented May 15, 2024

VinInn commented May 15, 2024

VinInn commented May 17, 2024

slava77 commented May 17, 2024

VinInn commented May 17, 2024

VinInn commented May 17, 2024 • edited

slava77 commented May 17, 2024

VinInn commented May 17, 2024 • edited

VinInn commented May 17, 2024

slava77 commented May 17, 2024

mmusich commented May 21, 2024

jfernan2 commented May 22, 2024

cmsbuild commented May 22, 2024

rappoccio commented May 22, 2024

VinInn commented May 5, 2024 •

edited

cmsbuild commented May 5, 2024 •

edited

VinInn commented May 5, 2024 •

edited

VinInn commented May 17, 2024 •

edited

VinInn commented May 17, 2024 •

edited