Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WorkspaceStd Slow to Start for Signal #1247

Open
atishelmanch opened this issue Nov 23, 2020 · 3 comments
Open

WorkspaceStd Slow to Start for Signal #1247

atishelmanch opened this issue Nov 23, 2020 · 3 comments

Comments

@atishelmanch
Copy link

Hello,

I am not sure if others are experiencing this issue, but I have found that when running our HH->WWgg tagger plugin with workspacestd on a signal file, for example with the command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=1 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1

(There is some customization for WWgg there, implemented in [1] and [2]) There is about 6 minutes taken between the time of opening the input microAOD file and the start of running through the events:

23-Nov-2020 15:09:08 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root

Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 23-Nov-2020 15:15:16.486 CET

(15:09 ---> 15:15)

However when running over data, with the following command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=/DoubleEG/spigazzi-Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1-d9c0c6cde5cc4a64343ae06f842e5085/USER doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 processId=Data processType=Data doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1

It only takes around 22 seconds between opening an input file and the start of running over events:

23-Nov-2020 16:11:27 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/spigazzi/flashgg/Era2017_RR-31Mar2018_v2/legacyRun2FullV1/DoubleEG/Era2017_RR-31Mar2018_v2-legacyRun2FullV1-v0-Run2017B-31Mar2018-v1/190606_094808/0000/myMicroAODOutputFile_519.root
Begin processing the 1st record. Run 297219, Event 255140966, LumiSection 124 on stream 0 at 23-Nov-2020 16:11:49.666 CET

(16:11:27 --> 16:11:49)

In discussing this with Simone in the past, a hypothesis for this slow running for signals may be an unnecessarily long ordering / going through of the MC scaling input files. This is quite an inconvenient issue as it takes lots of time to attempt to debug locally as it takes at least 6 minutes between each small fix that one wants to test.

If anyone may know why this issue is occurring or a potential solution, it would be much appreciated.

Thank you,
Abe

[1] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/python/HHWWggCustomize.py
[2] https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Taggers/plugins/HHWWggTagProducer.cc

@youyingli
Copy link
Contributor

Hi @atishelmanch, I saw you were turning on the doSystematics for signal processing but turning off it for data processing to compare the usage time between signal and date. However, if "doSystematics=True dumpTrees=True" for signal actually will produce a tree for each systematics label like (there are more than 60 trees)

https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L412L434
https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L460L466

and causes high CPU loading. I think you can just set "doSystematics=False" for signal and data processing to compare the consuming time. Of course, I expect the consuming time of signal processing is still longer than of data processing because there are still SFs applied to signal samples.

A useful method is to turn on (in fact you have turned on)

https://github.com/atishelmanch/flashgg/blob/HHWWgg_dev/Systematics/test/workspaceStd.py#L692

and investigate which modules give the larger CPU time and see what happens.

@atishelmanch
Copy link
Author

atishelmanch commented Nov 27, 2020

Hello @youyingli ,

Thank you for the suggestions. When I run with systematics off with the command:

cmsRun Systematics/test/workspaceStd.py metaConditions=MetaData/data/MetaConditions/Era2017_RR-31Mar2018_v1.json campaign=Era2017_RR-31Mar2018_v2 dataset=GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powh
eg-pythia8 doHHWWggTag=1 HHWWggTagsOnly=1 maxEvents=500 doSystematics=0 dumpWorkspace=0 dumpTrees=1 useAAA=1 doHHWWggTagCutFlow=1 saveHHWWggFinalStateVars=1 HHWWggAnalysisChannel=SL HHWWgguseZeroVtx=1 doHHWWggDebug=1

I indeed find the time between opening the file and running over events to be shorter than with systematics on:

27-Nov-2020 08:29:38 CET Successfully opened file root://cms-xrd-global.cern.ch//store/user/alesauva/flashgg/2017_1/10_6_4/GluGluToHHTo2G2Qlnu_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/2017_1-10_6_4-v0-RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/201105_142131/0000/myMicroAODOutputFile_36.root
Begin processing the 1st record. Run 1, Event 201310, LumiSection 2014 on stream 0 at 27-Nov-2020 08:33:50.705 CET

~ 4 minutes instead of around 6 minutes

When running the same data command as the above but turning systematics on, it again takes a short amount of time between opening the data file and running over the events, ~ 30 seconds.

I have shared the reports for both here:

/afs/cern.ch/work/a/atishelm/public/ForFlashgg/CmsRunOutputSignal.txt
/afs/cern.ch/work/a/atishelm/public/ForFlashgg/CmsRunOutputData.txt

Looking at the signal report, I see the largest CPU time in signal coming from flashggDifferentialPhoIdInputsCorrection, where the "per event" time is about 0.23 seconds. How can we tell which modules are taking up a lot of time before running over events when we only have the amount of time "per event" for modules? Maybe it's possible that this time is actually spent during event looping rather than between opening the file and beginning to run over events?

Thank you,
Abe

@rchatter
Copy link
Contributor

Hi Abe,

Somehow the timing report does not accurately represent the situation. By sheer luck I might have identified the offending module in the full sequence. It is this one "process.jetSystematicsSequence" that is reponsible for the large wait time [> 5 mins]. Haven't investigated why, but in any case for quick checks of code this can be turned off.

Rajdeep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants