Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Trigger Ntuples Trigger Storage #1189

Open
kkrizka opened this issue Apr 11, 2018 · 6 comments
Open

Optimize Trigger Ntuples Trigger Storage #1189

kkrizka opened this issue Apr 11, 2018 · 6 comments

Comments

@kkrizka
Copy link
Contributor

kkrizka commented Apr 11, 2018

I am looking at reducing the size of my ntuples. I made some quick plots looking at the space different branches take (via TBranch::GetTotalSize()). I split the branches into categories based on the word before the first _. If the word is not jet, fatjet, muon, el or ph, then it is put into the event category.

I put the composition of my data ntuples at the bottom. The event category takes up about 20% of the ntuples. Of that, over half is taken up by triggerNames (I run with #1184 applied, the branch is isPassedBitsNames in master). Probably not too surprising, since each trigger is stored as a lengthy set of characters (up to 20 for the large-R jet triggers). If you have several triggers, things add up...

Might be worth rethinking about how the trigger information is stored. My first thought is to have a boolean branch per trigger named triggername (or a float triggername_prescale). Similar to what the old NTUP_COMMON used. Might be faster, since one does not have to do a linear search through a list to determine a trigger decision. Not sure how nice this would be if the complete trigger list is not known at run time (ie: triggers added/removed for the different data periods).

@kratsg @ntadej Thoughts? Maybe I am the only one who stores a lot of trigger decisions (~50)....

Imgur
Imgur

@kratsg
Copy link
Contributor

kratsg commented Apr 11, 2018

Yeah, I'm not sure how easily feasible this is. Not many people are going to be good enough to be able to do trigger bit decisions at the ntuple level, especially for those who are joining ATLAS now. In most analyses, I only see ~5-10 trigger decisions being stored. Storing 50 does seem like a lot... It's an interesting thought. If you already specify the list of triggers you want to store, is it possible to store a function that calculates the trigger bit given a series of trigger names, and then you can search for that?

@fscutti
Copy link
Contributor

fscutti commented Apr 11, 2018

Hi @kratsg, are you suggesting to add the output of this function in addition to what @kkrizka suggests? I feel like just adding this output would reduce the freedom of the user downstream to experiment with different trigger lists. This is especially true if common ntuples are produced in an analysis. If we decide for this combined approach, may I suggest to store a vector for each trigger, where the first element is the trigger bit and the second the prescale?

@kratsg
Copy link
Contributor

kratsg commented Apr 11, 2018

@fscutti so no. What this effectively amounts to is requiring a consistent way of mapping input triggers to a fixed vector of trigger strings so that you just store a vector of prescales per event knowing that the order of the vector is well-defined... similarly with trigger bits for passing. The question really is, how do we sort/predetermine that order in an entirely generic / configurable way that doesn't place undue burden on the end user?

An example is to provide a python script that parses the config.py/config.json someone uses, extracts the trigger, and provides the necessary order... but then keeping that up to date with the C++ code becomes somewhat hard to do.

The other option might be to use a friend tree -- where the friend tree has a single row listing the trigger stings, and if you want to get the trigger names into your trees, just add a friend tree to link things up (join).

@beojan
Copy link
Contributor

beojan commented Apr 27, 2018

You could use an std::unordered_map instead of a vector. Then you would only need a single, general, map of trigger names to numeric id's.

You could even map from an enum class, though this would require providing a (trivial) specialization of std::hash to be C++11 compatible.

@beojan
Copy link
Contributor

beojan commented Apr 27, 2018

Edit isn't working.

enum class is probably a bad idea, given the sheer number of triggers there are.

@kkrizka
Copy link
Contributor Author

kkrizka commented Apr 27, 2018

Hi all,

I was not proposing to have a single bit string for triggers. I was thinking of a different branch per trigger decision, similar to what was used in the Run 1 ntuples.

--
Karol Krizka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants