WorkspaceStd Slow to Start for Signal #1247

atishelmanch · 2020-11-23T15:38:50Z

Hello,

I am not sure if others are experiencing this issue, but I have found that when running our HH->WWgg tagger plugin with workspacestd on a signal file, for example with the command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=1 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1

(There is some customization for WWgg there, implemented in [1] and [2]) There is about 6 minutes taken between the time of opening the input microAOD file and the start of running through the events:

23-Nov-2020 15:09:08 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root

Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 23-Nov-2020 15:15:16.486 CET

(15:09 ---> 15:15)

However when running over data, with the following command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=/DoubleEG/spigazzi-Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1-d9c0c6cde5cc4a64343ae06f842e5085/USER doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 processId=Data processType=Data doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1

It only takes around 22 seconds between opening an input file and the start of running over events:

23-Nov-2020 16:11:27 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/spigazzi/flashgg/Era2017_RR-31Mar2018_v2/legacyRun2FullV1/DoubleEG/Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1/190606_094808/0000/myMicroAODOutputFile_519.root
Begin processing the 1st record. Run 297219, Event 255140966, LumiSection 124 on stream 0 at 23-Nov-2020 16:11:49.666 CET

(16:11:27 --> 16:11:49)

In discussing this with Simone in the past, a hypothesis for this slow running for signals may be an unnecessarily long ordering / going through of the MC scaling input files. This is quite an inconvenient issue as it takes lots of time to attempt to debug locally as it takes at least 6 minutes between each small fix that one wants to test.

If anyone may know why this issue is occurring or a potential solution, it would be much appreciated.

Thank you,
Abe

[1] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/python/HHWWggCustomize.py
[2] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Taggers/plugins/HHWWggTagProducer.cc

youyingli · 2020-11-27T04:04:38Z

Hi @atishelmanch, I saw you were turning on the doSystematics for signal processing but turning off it for data processing to compare the usage time between signal and date. However, if "doSystematics=True dumpTrees=True" for signal actually will produce a tree for each systematics label like (there are more than 60 trees)

https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L412L434
https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L460L466

and causes high CPU loading. I think you can just set "doSystematics=False" for signal and data processing to compare the consuming time. Of course, I expect the consuming time of signal processing is still longer than of data processing because there are still SFs applied to signal samples.

A useful method is to turn on (in fact you have turned on)

https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L692

and investigate which modules give the larger CPU time and see what happens.

atishelmanch · 2020-11-27T07:57:44Z

Hello @youyingli ,

Thank you for the suggestions. When I run with systematics off with the command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1

I indeed find the time between opening the file and running over events to be shorter than with systematics on:

27-Nov-2020 08:29:38 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root
Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 27-Nov-2020 08:33:50.705 CET

~ 4 minutes instead of around 6 minutes

When running the same data command as the above but turning systematics on, it again takes a short amount of time between opening the data file and running over the events, ~ 30 seconds.

I have shared the reports for both here:

/afs/cern.ch/work/a/atishelm/public/ForFlashgg/CmsRunOutputSignal.txt
/afs/cern.ch/work/a/atishelm/public/ForFlashgg/CmsRunOutputData.txt

Looking at the signal report, I see the largest CPU time in signal coming from flashggDifferentialPhoIdInputsCorrection, where the "per event" time is about 0.23 seconds. How can we tell which modules are taking up a lot of time before running over events when we only have the amount of time "per event" for modules? Maybe it's possible that this time is actually spent during event looping rather than between opening the file and beginning to run over events?

Thank you,
Abe

rchatter · 2020-11-27T12:58:41Z

Hi Abe,

Somehow the timing report does not accurately represent the situation. By sheer luck I might have identified the offending module in the full sequence. It is this one "process.jetSystematicsSequence" that is reponsible for the large wait time [> 5 mins]. Haven't investigated why, but in any case for quick checks of code this can be turned off.

Rajdeep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WorkspaceStd Slow to Start for Signal #1247

WorkspaceStd Slow to Start for Signal #1247

atishelmanch commented Nov 23, 2020

youyingli commented Nov 27, 2020

atishelmanch commented Nov 27, 2020 •

edited

rchatter commented Nov 27, 2020

WorkspaceStd Slow to Start for Signal #1247

WorkspaceStd Slow to Start for Signal #1247

Comments

atishelmanch commented Nov 23, 2020

youyingli commented Nov 27, 2020

atishelmanch commented Nov 27, 2020 • edited

rchatter commented Nov 27, 2020

atishelmanch commented Nov 27, 2020 •

edited