Unfolding

This page is work in progress. It will represent the status of the code after 102x PR is merged.

Unfolding with combine

General introduction

Many other resources are available to define in more detail what is unfolding when to use it (or not), and what are the associated problematics. A useful summary is available at the StatComm page.

Unfolding or usmearing is the ``art'' of undoing detector induced smearing effects, in order to bring the observation to a prior theory level.

The basic idea behind unfolding techniques is linearity, i.e., the possibility of describing changes induced by the detector in the truth bin through a linear relation of what happens in the nearby truth-bins. Or in other words that we can model through probability (independent of e.g. the content) the changes that the event falling in the truth bin is reconstructed in the bin ( )

y=Rtx+b

Or, if R contains the expected yields with respect to a particular mc. We will use this formalism. y=Rm+b

Unfolding aims to find the truth distribution x, given the observation y.

External software

There are many software tools to perform unfolding and each of them as pro and contra.

RooUnfold. Extension to ROOT, simple to use, can easily switch across regularization types
TUnfold. Embedded in ROOT.

Likelihood-based unfolding

Why?

because we want, because we can
because background subtraction and/or signal extraction are non-trivial
because nuisances and correlations are non-trivial
because we want to profile nuisances ...

How?

Combine minimize the likelihood function for you! Given the true content induced by the bin , the expected events for the true-bin in the reco-bin . and is the poisson distribution/likelihood:

In more complex signal extraction, the likelihood may be written as a function of ancillary variables indexed through k (e.g. the invariant mass of something):

(TO CHECK)

In practice we need to construct the ``response matrix'' and unroll it in the reconstructed bins:

Generator distribution, after the generator-cut only.
For each bin of the Generator distribution, the propagation to the reconstructed bins (corrected for efficiencies ...).
The propagation of the out-of-acceptance terms to the reconstructed level (if it is a fiducial measurement).

For simple extraction and simple regularization, multiple reconstructed bins can be patched together in a 1D distribution. (Reco-level plot)

The observations are now extracted through likelihood ratio profile:

text2workspace.py -m 125 --X-allow-no-background -o datacard.root datacard.txt
   -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO map='.*GenBin0.*:r_Bin0[1,-1,20]' --PO map='.*GenBin1.*:r_Bin1[1,-1,20]' --PO map='.*GenBin2.*:r_Bin2[1,-1,20]' --PO map='.*GenBin3.*:r_Bin3[1,-1,20]' --PO map='.*GenBin4.*:r_Bin4[1,-1,20]'

## combine -M MultiDimFit --setParameters=r_Bin0=1,r_Bin1=1,r_Bin2=1,r_Bin3=1,r_Bin4=1 -t -1 -m 125 datacard.root
## combine -M MultiDimFit --setParameters=r_Bin0=1,r_Bin1=1,r_Bin2=1,r_Bin3=1,r_Bin4=1 -t -1 -m 125 --algo=grid --points=100 -P r_Bin1 --setParameterRanges r_Bin1=0.5,1.5 --floatOtherPOIs=1 datacard.root

Notice that switching to the so called bin-by-bin (strongly discouraged except for tests) it is also quite easy: for bin by bin use:

  -P HiggsAnalysis.CombinedLimit.PhysicsModel:multiSignalModel --PO map='.*RecoBin0.*:r_Bin0[1,-1,20]' --PO map='.*RecoBin1.*:r_Bin1[1,-1,20]' --PO map='.*RecoBin2.*:r_Bin2[1,-1,20]' --PO map='.*RecoBin3.*:r_Bin3[1,-1,20]' --PO map='.*RecoBin4.*:r_Bin4[1,-1,20]'

Usually, the observations are extracted for each bin, profiling all the others, and using a 1-parameter MINOS approach for each bin. 2D or 3D (or 100D if you have CPU power!) likelihood may also be possible. Hesse can be run on the best fit to derive an approximate correlation matrix.

Given the true spectrum with the same MC xtrue , the observations are now, remembering that R includes corrected efficiency and acceptance factors:

xobs

Nuisances can be added to the likelihood function and profiled in the usual way (like the mass of the resonance, ...). Theory uncertainty on the inclusive cross section is not present in xobs but they are on muobs , due to cancellation (correlation). This is why it is important to use the same MC-truth for the response matrix and the last rescaling.

Regularization

Regularization is used to stabilized unfolding. The study of the regularization parameter needs to be performed with external software that thanks to some simplification are likely to be much faster.

In order to apply regularization the simplest way in this framework is to apply a penalization term in the likelihood function (Tickonov regularization):

where P is a linear operator. Two different approaches are present.

SVD

In the SVD approach -- described in the SVD paper -- the penalization term is put directly on the strength: where A is usually the discrete curvature matrix. The penalization on the first derivatives for the first and last bin may be included or not.

In order to implement such a regularization technique, we modify the likelihood in order to have a constraint (one for each row of the product) Amu , by adding to the datacards:

constr1 constr r_Bin0+r_Bin2-2*r_Bin1 0.001

The valid syntax are:

nuis_name constr formula [dependents] value
nuis_name constr formula [dependents] variable

The formula can be specified with the parameters or the positional calculation, the dependents (optional) can be defined with or w/o the graphs, and the strength as a variable/new variable or value.

The possibility of specifying the dependands is kept for complicate cases, where potentially the scripts do not figure them out automatically or when the positional formula is preferred. Examples of the syntax are:

constr1 constr r_bin0-r_bin1 0.01
constr1 constr r_bin0-r_bin1 delta[0.01]
constr1 constr r_bin0+r_bin1 r_bin0,r_bin1 0.01
constr1 constr r_bin0+r_bin1 {r_bin0,r_bin1} delta[0.01]
constr1 constr @0-2*@2+@1 r_Bin0,r_Bin1,r_Bin2 delta[0.03]

The strength of the regularization is here in the form of sigma (TOCHECK)

TUnfold

The Tikhonov regularization as implemented in TUnfold uses the MC information or the densities (prediction over bin width) as a bias vector. In order to give to combine with the necessary information, each reco-bin needs to be implemented independently to access the observed normalization in the minimization ( xobs ). In this case the bias vector is xobs-xtrue

An example of this implementation can be:

constr1 constr (r_Bin0-1.)*(shapeSig_GenBin0_RecoBin0__norm+shapeSig_GenBin0_RecoBin1__norm+shapeSig_GenBin0_RecoBin2__norm+shapeSig_GenBin0_RecoBin3__norm+shapeSig_GenBin0_RecoBin4__norm)+(r_Bin2-1.)*(shapeSig_GenBin2_RecoBin0__norm+shapeSig_GenBin2_RecoBin1__norm+shapeSig_GenBin2_RecoBin2__norm+shapeSig_GenBin2_RecoBin3__norm+shapeSig_GenBin2_RecoBin4__norm)-2*(r_Bin1-1.)*(shapeSig_GenBin1_RecoBin0__norm+shapeSig_GenBin1_RecoBin1__norm+shapeSig_GenBin1_RecoBin2__norm+shapeSig_GenBin1_RecoBin3__norm+shapeSig_GenBin1_RecoBin4__norm) {r_Bin0,r_Bin1,r_Bin2,shapeSig_GenBin1_RecoBin0__norm,shapeSig_GenBin0_RecoBin0__norm,shapeSig_GenBin2_RecoBin0__norm,shapeSig_GenBin1_RecoBin1__norm,shapeSig_GenBin0_RecoBin1__norm,shapeSig_GenBin2_RecoBin1__norm,shapeSig_GenBin1_RecoBin2__norm,shapeSig_GenBin0_RecoBin2__norm,shapeSig_GenBin2_RecoBin2__norm,shapeSig_GenBin1_RecoBin3__norm,shapeSig_GenBin0_RecoBin3__norm,shapeSig_GenBin2_RecoBin3__norm,shapeSig_GenBin1_RecoBin4__norm,shapeSig_GenBin0_RecoBin4__norm,shapeSig_GenBin2_RecoBin4__norm} delta[0.03]

An example of the two methods can be found in: data/tutorials/regularization where two scripts:

python createWs.py [-r]

demonstrate a simple datacard creation and fit with and without regularization, while:

python makeModel.py

create a more complex datacard with the two regularization scheme implemented (but needs to be activated/uncomment)

Limitations

Currently, no automatic study of the regularization strength is implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly