-
Notifications
You must be signed in to change notification settings - Fork 40
Producing MicroAODS for data
General instructions on microAOD production are found on the repository README and in the flashgg documentation. This focuses on the production of microAODs for data.
-
Environment
cmsenv CAMPAIGN=EXOSpring16_v1 PART=<incremental number> MY_GRID_USER=`whoami` # or your grid user name if differs from unix account name
-
Make a snapshot of the existing dataset catalog.
TMP_CATALOG=${CAMPAIGN}_p${PART}_temp fggManageSamples.py -m diphotons -C ${TMP_CATALOG} catimport diphotons:${CAMPAIGN} \*Run2016\*
-
Prepare work area in flashgg
cd ${CMSSW_BASE}/src/flashgg/MetaData/work ln -sf ${CMSSW_BASE}/src/diphotons/MetaData/work/analysis_microAOD.py . # allow running on invalid datasets cat crabConfig_TEMPLATE.py > mycrabConfig_TEMPLATE.py cat >> mycrabConfig_TEMPLATE.py << EOF ## config.Data.allowNonValidInputDataset=True EOF
-
Set-up crab
source /cvmfs/cms.cern.ch/crab3/crab.sh voms-proxy-init --voms cms --valid 168:00
-
Prepare a target JSON file for the processing.
cd ${CMSSW_BASE}/src/flashgg/MetaData/work ./fggCookJson.py --field 3.8T --dqm-folder /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions16/13TeV/ --bunch-space ''
This will produce files called myjson_DCSONLY3.8T-<RUNNUM>.txt
and myjson_3.8T__<RUNNUM>.txt
.
The first contains the latest part of the DCS-only JSON coming after the last certified run number. The second will be the or of the latest certification JSON and the first file.
-
Find out the list of lumi sections to be processed for each dataset.
## remove previous config rm all_missing.json ./fggRollingDataset.py --target myjson_3.8T__\*.txt --dataset <dataset_1> --catalog diphotons:${TMP_CATALOG} ./fggRollingDataset.py --target myjson_3.8T__\*.txt --dataset <dataset_2> --catalog diphotons:${TMP_CATALOG} ...
This will create a folder for each dataset, containing three files:
target.json
,processed.json
andmissing.json
. The first contains the subset of the target JSON file contained in each dataset, determined filtering the target JSON to be between the minimum and maximum run number in each dataset.
The fileall_missing.json
is meant to be loaded byprepareCrabJobs.py
and contains the list of dataset to be processed with the correspondingmissing.json
lumi mask.
Note: The list of primary datasets to be processed is/SinglePhoton
,/SingleElectron
and/DoubleEG
. Please double check in DAS the list of secondary datasets.
Currently the list of secondary datasets isRun2016B-PromptReco-v1
andRun2016B-PromptReco-v2
-
Edit
all_missing.json
adding empty signal and background dataset lists"sig" : [], "bkg" : []
. -
Prepare crab configurations
./prepareCrabJobs.py -L 10 -C ${CAMPAIGN}_p${PART} -s all_missing.json -p analysis_microAOD.py
-
Launch production
cd ${CAMPAIGN}_p${PART} parallel --ungroup 'crab sub {} | tee {}.log' ::: *.py # or explicit list of configs to run
-
Monitor production and update catalog, preparing a script to be run continuosly in screen.
echo ${CAMPAIGN}_p${PART}/crab_*/ > running_tasks.txt cat > mon.sh << EOF #!/bin/bash # import files from DBS fggManageSamples.py -m diphotons -C ${TMP_CATALOG} import '/*/*${MY_GRID_USER}*${CAMPAIGN}_p${PART}*/USER' # run check jobs fggManageSamples.py -m diphotons -C ${TMP_CATALOG} check -q 8nm # resubmit possibly failed jobs cat running_tasks.txt | tr ' ' '\n' | parallel -j 6 'crab resubmit {}' EOF chmod 755 mon.sh while [[ 1==1 ]]; do ./mon.sh; sleep 360; done
-
Once production is over, import new datasets in catalog. Be aware that p3 has duplicates so fggManageSample will camplain about that, say "yes" to all request in order to keep all the files.
fggManageSamples.py -m diphotons -C ${CAMPAIGN} catimport diphotons:${TMP_CATALOG} \*Run2016\* fggManageSamples.py -m diphotons -C ${CAMPAIGN} check fggManageSamples.py -m diphotons -C ${CAMPAIGN} overlap /<dataset>*
where stands for
DoubleEG
,SinglePhoton
andSingleElectron
. Manually check that there are no overlaps between different parts (an empty json should be produced for each comparison pair '{}'). -
Commit new catalog and make pull request.
cd ${CMSSW_BASE}/src/diphotons git checkout -b production_${CAMPAIGN}_p${PART} git add MetaData/data/${CAMPAIGN}/datasets.json git commit MY_GITHUB_NAME=$(git config --get user.github) git remote add ${MY_GITHUB_NAME} git@github.com:${MY_GITHUB_NAME}/diphotons.git git push -u ${MY_GITHUB_NAME} production_${CAMPAIGN}_p${PART}