Producing MicroAODS for data

Producing MicroAODs for data

General instructions on microAOD production are found on the repository README and in the flashgg documentation. This focuses on the production of microAODs for data.

Environment

cmsenv
CAMPAIGN=EXOSpring16_v1
PART=<incremental number>
MY_GRID_USER=`whoami` # or your grid user name if differs from unix account name

Make a snapshot of the existing dataset catalog.

TMP_CATALOG=${CAMPAIGN}_p${PART}_temp
fggManageSamples.py -m diphotons -C ${TMP_CATALOG}  catimport diphotons:${CAMPAIGN} \*Run2016\*

Prepare work area in flashgg

cd ${CMSSW_BASE}/src/flashgg/MetaData/work
ln -sf  ${CMSSW_BASE}/src/diphotons/MetaData/work/analysis_microAOD.py .
# allow running on invalid datasets
cat crabConfig_TEMPLATE.py > mycrabConfig_TEMPLATE.py
cat >> mycrabConfig_TEMPLATE.py << EOF

## config.Data.allowNonValidInputDataset=True
EOF

Set-up crab

source /cvmfs/cms.cern.ch/crab3/crab.sh
voms-proxy-init --voms cms --valid 168:00

Prepare a target JSON file for the processing.

cd ${CMSSW_BASE}/src/flashgg/MetaData/work
./fggCookJson.py --field 3.8T --dqm-folder /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions16/13TeV/ --bunch-space ''

This will produce files called myjson_DCSONLY3.8T-<RUNNUM>.txt and myjson_3.8T__<RUNNUM>.txt. The first contains the latest part of the DCS-only JSON coming after the last certified run number. The second will be the or of the latest certification JSON and the first file.

Find out the list of lumi sections to be processed for each dataset.
```
## remove previous config
rm all_missing.json

./fggRollingDataset.py --target myjson_3.8T__\*.txt --dataset <dataset_1> --catalog diphotons:${TMP_CATALOG}
./fggRollingDataset.py --target myjson_3.8T__\*.txt --dataset <dataset_2> --catalog diphotons:${TMP_CATALOG}
...
```
This will create a folder for each dataset, containing three files: target.json, processed.json and missing.json. The first contains the subset of the target JSON file contained in each dataset, determined filtering the target JSON to be between the minimum and maximum run number in each dataset.
The file all_missing.json is meant to be loaded by prepareCrabJobs.py and contains the list of dataset to be processed with the corresponding missing.json lumi mask.
Note: The list of primary datasets to be processed is /SinglePhoton, /SingleElectron and /DoubleEG. Please double check in DAS the list of secondary datasets.
Currently the list of secondary datasets is Run2016B-PromptReco-v1 and Run2016B-PromptReco-v2
Edit all_missing.json adding empty signal and background dataset lists "sig" : [], "bkg" : [].

Prepare crab configurations

./prepareCrabJobs.py -L 10 -C ${CAMPAIGN}_p${PART} -s all_missing.json -p analysis_microAOD.py

Launch production

cd ${CAMPAIGN}_p${PART}
parallel --ungroup 'crab sub {} | tee {}.log' ::: *.py # or explicit list of configs to run

Monitor production and update catalog, preparing a script to be run continuosly in screen.

echo ${CAMPAIGN}_p${PART}/crab_*/ > running_tasks.txt
cat > mon.sh << EOF
#!/bin/bash

# import files from DBS
fggManageSamples.py -m diphotons -C ${TMP_CATALOG} import '/*/*${MY_GRID_USER}*${CAMPAIGN}_p${PART}*/USER'

# run check jobs
fggManageSamples.py -m diphotons -C ${TMP_CATALOG} check -q 8nm

# resubmit possibly failed jobs
cat running_tasks.txt |  tr ' ' '\n' | parallel -j 6 'crab resubmit {}'
EOF

chmod 755 mon.sh

while [[ 1==1 ]]; do ./mon.sh; sleep 360; done

Once production is over, import new datasets in catalog. Be aware that p3 has duplicates so fggManageSample will camplain about that, say "yes" to all request in order to keep all the files.
```
fggManageSamples.py -m diphotons -C ${CAMPAIGN}  catimport diphotons:${TMP_CATALOG} \*Run2016\*
fggManageSamples.py -m diphotons -C ${CAMPAIGN}  check
fggManageSamples.py -m diphotons -C ${CAMPAIGN} overlap /<dataset>*
```
where stands for DoubleEG, SinglePhoton and SingleElectron. Manually check that there are no overlaps between different parts (an empty json should be produced for each comparison pair '{}').

Commit new catalog and make pull request.

cd ${CMSSW_BASE}/src/diphotons
git checkout -b production_${CAMPAIGN}_p${PART}

git add MetaData/data/${CAMPAIGN}/datasets.json
git commit

MY_GITHUB_NAME=$(git config --get user.github)
git remote add ${MY_GITHUB_NAME} git@github.com:${MY_GITHUB_NAME}/diphotons.git
git push -u ${MY_GITHUB_NAME} production_${CAMPAIGN}_p${PART}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producing MicroAODS for data

Producing MicroAODs for data

Clone this wiki locally