The python scripts in this repository should help you get started analysing the HGCAL L1 TP ntuples
This step is lxplus
specific, givin access to a more recent python
and root
version.
Edit/skip it accordingly for your specific system.
source setup_lxplus.sh
This stetp needs to be done only once for your account and can be done with whatever python
version is in use in the system.
For some reason the current CMSSW
scrips seems to deliver an inconsistent setup of virtualenv
and virtualenvwrapper
, for this reason we force a new installation in ~/.local
using:
pip install --ignore-installed --user virtualenv==15.1.0 virtualenvwrapper
For a more complete overview of the procedure you can refer to
virtualenvwrapper
installation instructions
For starting using virtualenvwrapper
source setVirtualEnvWrapper.sh
The first time you will have to create the actual instance of the virtualenv
:
mkvirtualenv --system-site-packages -p `which python3.8` -r requirements_py3.8.txt <venvname>
and
for python 3.8 and 3.10 respectively.
You can use the file directly using for example:
pip install -r requirements_py3.8.txt
This step is lxplus
specific, givin access to a more recent python
and root
version.
Edit/skip it accordingly for your specific system.
source setup_lxplus.sh
For starting using virtualenvwrapper
source setVirtualEnvWrapper.sh
After this initial (once in a time) setup is done you can just activate the virtualenv calling:
workon <venvname>
(lsvirtualenv
is your friend in case you forgot the name).
The main script is analyzeHgcalL1Tntuple.py
:
python analyzeHgcalL1Tntuple.py --help
An example of how to run it:
python analyzeHgcalL1Tntuple.py -f cfg/hgctps.yaml -i cfg/datasets/ntp_v81.yaml -c tps -s doubleele_flat1to100_PU200 -n 1000 -d 0
The configuration is handled by 2 yaml files. One specifying
- output directories
- versioning of the plots
- collections of samples, i.e. group of samples to be processed homogeneously: for each collection the list of plotters (see below) to be run is provided.
The other prividing
- details of the input samples (location of the ntuple files)
Example of configuration file can be found in:
The list of branches to be read and converted in pandas DataFrame
format is specified in the module
Instantiating an object of class DFCollection
. What is actually read event by event depends anyhow on which plotters are actually instantiated (collections are read on-demand).
Selections are defined as strings in the module:
Different collections are defined for different objects and/or different purposes. The selections have a name
whcih is used for the histogram naming (see below). Selections are used by the plotters.
The actual functionality of accessing the objects, filtering them according to the selections
and filling histograms
is provided by the plotter classes defined in the module:
Basic plotters are already available, most likely you just need to instantiate one of them (or a collection of them) using the DFCollection
instance you are interested in.
Which collection is run for which sample is steered by the configuration file.
The plotters access one or more collections, select them in several different ways, book and fill the histograms (see below).
Histograms are handled in the module:
There are different classes of histograms depending on the input object and on the purpose.
To add a new histogram to an existing class it is enough to add it in the corresponding constructor and in the fill
module. The writing of the histos to files is handled transparently.
The histogram naming follows the convention:
<ObjectName>_<SelectionName>_<GenSelectionName>_<HistoName>
This is assumed in all the plotters
and in the code to actually draw the histograms.
Note that the script analyzeHgcalL1Tntuple.py
can be used to submit the jobs to the HTCondor batch system invoking the -b
option. A dag configuration is created and you can actually submit it following the script output.
For each sample injected in the batch system a DAG is created. The DAG will submitt an hadd
command once all the jobs will succeed.
However, if you don't want to wait (or you don't care) you can submit also a condor job that will run hadd periodically thus reducing dramatically the latency.
For example:
condor_submit batch_single_empart_guns_tracks_v77/ele_flat2to100_PU0/batch_harvest.sub