Skip to content

Commit

Permalink
Improving plotting scripts, fetcher, and fixing test_wf (#20)
Browse files Browse the repository at this point in the history
* minor tweaks for plotting

* allow additional scaling for multiple samples at the same time

* fixed README file

* Fixed linting (but not everywhere...)

* Fix example forkflow

* Simplify additional scale method, removing sumw check
  • Loading branch information
andreypz authored Jul 27, 2023
1 parent b95a723 commit 2234e91
Show file tree
Hide file tree
Showing 10 changed files with 118 additions and 222 deletions.
155 changes: 2 additions & 153 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,47 +70,7 @@ voms-proxy-init --voms cms --vomses ~/.grid-security/vomses
Use the `./filefetcher/fetch.py` script:

```
<<<<<<< HEAD
python filefetcher/fetch.py --input filefetcher/input_DAS_list.txt --output output_name.json
=======
--wf {validation,ttcom,ttdilep_sf,ttsemilep_sf,emctag_ttdilep_sf,ctag_ttdilep_sf,ectag_ttdilep_sf,ctag_ttsemilep_sf,ectag_ttsemilep_sf,ctag_Wc_sf,ectag_Wc_sf,ctag_DY_sf,ectag_DY_sf}, --workflow {validation,ttcom,ttdilep_sf,ttsemilep_sf,emctag_ttdilep_sf,ctag_ttdilep_sf,ectag_ttdilep_sf,ctag_ttsemilep_sf,ectag_ttsemilep_sf,ctag_Wc_sf,ectag_Wc_sf,ctag_DY_sf,ectag_DY_sf}
Which processor to run
-o OUTPUT, --output OUTPUT
Output histogram filename (default: hists.coffea)
--samples SAMPLEJSON, --json SAMPLEJSON
JSON file containing dataset and file locations
(default: dummy_samples.json)
--year YEAR Year
--campaign CAMPAIGN Dataset campaign, change the corresponding correction
files{ "Rereco17_94X","Winter22Run3","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL"}
--isCorr Run with SFs
--isJERC JER/JEC implemented to jet
--isSyst Run with systematics for SF
--executor {iterative,futures,parsl/slurm,parsl/condor,parsl/condor/naf_lite,dask/condor,dask/slurm,dask/lpc,dask/lxplus,dask/casa}
The type of executor to use (default: futures).
-j WORKERS, --workers WORKERS
Number of workers (cores/threads) to use for multi- worker executors (e.g. futures or condor) (default:
3)
-s SCALEOUT, --scaleout SCALEOUT
Number of nodes to scale out to if using slurm/condor.
Total number of concurrent threads is ``workers x
scaleout`` (default: 6)
--memory MEMORY Memory used in jobs (in GB) ``(default: 4GB)
--disk DISK Disk used in jobs ``(default: 4GB)
--voms VOMS Path to voms proxy, made accessible to worker nodes.
By default a copy will be made to $HOME.
--chunk N Number of events per process chunk
--retries N Number of retries for coffea processor
--index INDEX (Specific for dask/lxplus file splitting, default:0,0)
Format: $dictindex,$fileindex. $dictindex refers to the index of the file list split to 50 files per dask-worker.
The job will start submission from the corresponding indices
--validate Do not process, just check all files are accessible
--skipbadfiles Skip bad files.
--only ONLY Only process specific dataset or file
--limit N Limit to the first N files of each dataset in sample
JSON
--max N Max number of chunks to run in total
>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)
```
where the `input_DAS_list.txt` is a simple file with a list of dataset names extract from DAS (you need to create it yourself for the samples you want to run over), and output json file in creted in `./metadata` directory.

Expand Down Expand Up @@ -405,15 +365,9 @@ memray run --live runner.py --cfg config/example.py

All the `lumiMask`, correction files (SFs, pileup weight), and JEC, JER files are under `BTVNanoCommissioning/src/data/` following the substructure `${type}/${campaign}/${files}`(except `lumiMasks` and `Prescales`)

<<<<<<< HEAD

Produce data/MC comparison, shape comparison plots from `.coffea` files, load configuration (`yaml`) files, brief [intro](https://docs.fileformat.com/programming/yaml/) of yaml.
=======
## Correction files configurations
:heavy_exclamation_mark: If the correction files are not supported yet by jsonpog-integration, you can still try with custom input data.

### Options with custom input data
>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)
Details of yaml file format would summarized in table below. Information used in data/MC script would marked with () and comparsion script with (). The **required** info are marked as bold style.


Expand All @@ -431,12 +385,8 @@ python plotting/comparison.py --cfg testfile/btv_compare.yml (--debug)
### Use central maintained jsonpog-integration
The official correction files collected in [jsonpog-integration](https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration) is updated by POG except `lumiMask` and `JME` still updated by maintainer. No longer to request input files in the `correction_config`.

<<<<<<< HEAD
<details><summary>See the example with `2017_UL`.</summary>
=======

<details><summary>Take `Rereco17_94X` as an example.</summary>
>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)
<p>

```python
Expand Down Expand Up @@ -469,45 +419,7 @@ The official correction files collected in [jsonpog-integration](https://gitlab.
</p>
</details>

<<<<<<< HEAD
=======
### Use central maintained jsonpog-integration
The official correction files collected in [jsonpog-integration](https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration) is updated by POG except `lumiMask` and `JME` still updated by maintainer. No longer to request input files in the `correction_config`.

<details><summary>See the example with `2017_UL`.</summary>
<p>

```python
"2017_UL": {
# Same with custom config
"lumiMask": "Cert_294927-306462_13TeV_UL2017_Collisions17_MuonJSON.txt",
"JME": "jec_compiled.pkl.gz",
# no config need to be specify for PU weights
"PU": None,
# Btag SFs - specify $TAGGER : $TYPE-> find [$TAGGER_$TYPE] in json file
"BTV": {"deepCSV": "shape", "deepJet": "shape"},

"LSF": {
# Electron SF - Following the scheme: "${SF_name} ${year}": "${WP}"
# https://github.com/cms-egamma/cms-egamma-docs/blob/master/docs/EgammaSFJSON.md
"ele_ID 2017": "wp90iso",
"ele_Reco 2017": "RecoAbove20",

# Muon SF - Following the scheme: "${SF_name} ${year}": "${WP}"
# WPs : ['NUM_GlobalMuons_DEN_genTracks', 'NUM_HighPtID_DEN_TrackerMuons', 'NUM_HighPtID_DEN_genTracks', 'NUM_IsoMu27_DEN_CutBasedIdTight_and_PFIsoTight', 'NUM_LooseID_DEN_TrackerMuons', 'NUM_LooseID_DEN_genTracks', 'NUM_LooseRelIso_DEN_LooseID', 'NUM_LooseRelIso_DEN_MediumID', 'NUM_LooseRelIso_DEN_MediumPromptID', 'NUM_LooseRelIso_DEN_TightIDandIPCut', 'NUM_LooseRelTkIso_DEN_HighPtIDandIPCut', 'NUM_LooseRelTkIso_DEN_TrkHighPtIDandIPCut', 'NUM_MediumID_DEN_TrackerMuons', 'NUM_MediumID_DEN_genTracks', 'NUM_MediumPromptID_DEN_TrackerMuons', 'NUM_MediumPromptID_DEN_genTracks', 'NUM_Mu50_or_OldMu100_or_TkMu100_DEN_CutBasedIdGlobalHighPt_and_TkIsoLoose', 'NUM_SoftID_DEN_TrackerMuons', 'NUM_SoftID_DEN_genTracks', 'NUM_TightID_DEN_TrackerMuons', 'NUM_TightID_DEN_genTracks', 'NUM_TightRelIso_DEN_MediumID', 'NUM_TightRelIso_DEN_MediumPromptID', 'NUM_TightRelIso_DEN_TightIDandIPCut', 'NUM_TightRelTkIso_DEN_HighPtIDandIPCut', 'NUM_TightRelTkIso_DEN_TrkHighPtIDandIPCut', 'NUM_TrackerMuons_DEN_genTracks', 'NUM_TrkHighPtID_DEN_TrackerMuons', 'NUM_TrkHighPtID_DEN_genTracks']

"mu_Reco 2017_UL": "NUM_TrackerMuons_DEN_genTracks",
"mu_HLT 2017_UL": "NUM_IsoMu27_DEN_CutBasedIdTight_and_PFIsoTight",
"mu_ID 2017_UL": "NUM_TightID_DEN_TrackerMuons",
"mu_Iso 2017_UL": "NUM_TightRelIso_DEN_TightIDandIPCut",
},
},
```

</p>
</details>

>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)
## Create compiled JERC file(`pkl.gz`)

| Parameter name | Allowed values | Description
Expand Down Expand Up @@ -540,7 +452,7 @@ In `comparison.py` config file (`testfile/btv_compare.yaml`), color and label n
<details><summary>Code snipped</summary>
<p>

<<<<<<< HEAD

```yaml
## plodataMC.py
mergemap:
Expand Down Expand Up @@ -577,46 +489,12 @@ compare:
</p>
</details>
=======
:new: non-uniform rebinning is possible, specify the bins with list of edges `--autorebin 50,80,81,82,83,100.5`

```
python plotdataMC.py -i a.coffea,b.coffea --lumi 41500 -p dilep_sf -v z_mass,z_pt
python plotdataMC.py -i "test*.coffea" --lumi 41500 -p dilep_sf -v z_mass,z_pt

options:
-h, --help show this help message and exit
--lumi LUMI luminosity in /pb
--com COM sqrt(s) in TeV
-p {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}, --phase {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}
which phase space
--log LOG log on y axis
--norm NORM Use for reshape SF, scale to same yield as no SFs case
-v VARIABLE, --variable VARIABLE
variables to plot, splitted by ,. Wildcard option * available as well. Specifying `all` will run through all variables.
--SF make w/, w/o SF comparisons
--ext EXT prefix name
-i INPUT, --input INPUT
input coffea files (str), splitted different files with ','. Wildcard option * available as well.
--autorebin AUTOREBIN
Rebin the plotting variables, input `int` or `list`. int: merge N bins. list of number: rebin edges(non-uniform bin is possible)
--xlabel XLABEL rename the label for x-axis
--ylabel YLABEL rename the label for y-axis
--splitOSSS SPLITOSSS
Only for W+c phase space, split opposite sign(1) and same sign events(-1), if not specified, the combined OS-SS phase space is used
--xrange XRANGE custom x-range, --xrange xmin,xmax
--flow FLOW
str, optional {None, 'show', 'sum'} Whether plot the under/overflow bin. If 'show', add additional under/overflow bin. If 'sum', add the under/overflow bin content to first/last bin.
```
- data/data, MC/MC comparisons
>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)
#### Variables
Common definitions for both usage, use default settings if leave empty value for the keys.
:bangbang: `blind` option is only used in the data/MC comparison plots to blind particular observable like BDT score.

<<<<<<< HEAD
|Option| Default |
|:-----: |:---: |
| `xlabel` | take name of `key` |
Expand Down Expand Up @@ -663,35 +541,6 @@ Common definitions for both usage, use default settings if leave empty value for
all:
rebin: 2
```
=======
options:
-h, --help show this help message and exit
-p {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}, --phase {dilep_sf,ttsemilep_sf,ctag_Wc_sf,ctag_DY_sf,ctag_ttsemilep_sf,ctag_ttdilep_sf}
which phase space
-i INPUT, --input INPUT
input coffea files (str), splitted different files with ','. Wildcard option * available as well.
-r REF, --ref REF referance dataset
-c COMPARED, --compared COMPARED
compared datasets, splitted by ,
--sepflav SEPFLAV seperate flavour(b/c/light)
--log log on y axis
-v VARIABLE, --variable VARIABLE
variables to plot, splitted by ,. Wildcard option * available as well. Specifying `all` will run through all variables.
--ext EXT prefix name
--com COM sqrt(s) in TeV
--shortref SHORTREF short name for reference dataset for legend
--shortcomp SHORTCOMP
short names for compared datasets for legend, split by ','
--autorebin AUTOREBIN
Rebin the plotting variables, input `int` or `list`. int: merge N bins. list of number: rebin edges(non-uniform bin is possible)
--xlabel XLABEL rename the label for x-axis
--ylabel YLABEL rename the label for y-axis
--norm compare shape, normalized yield to reference
--xrange XRANGE custom x-range, --xrange xmin,xmax
--flow FLOW
str, optional {None, 'show', 'sum'} Whether plot the under/overflow bin. If 'show', add additional under/overflow bin. If 'sum', add the under/overflow bin content to first/last bin.
```
>>>>>>> ca74d50... feat: correctionlib(jsonpog-integration) implementation & fixes on actions (#50)

</p>
</details>
Expand Down
12 changes: 7 additions & 5 deletions filefetcher/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,12 @@ def getFilesFromDas(args):

dsname = dataset.strip().split("/")[1] # Dataset first name

Tier = dataset.strip().split("/")[
3
] # NANOAODSIM for regular samples, USER for private
if "SIM" not in Tier:
Tier = dataset.strip().split("/")[3]
# Tier = NANOAOD[SIM] for regular samples, USER for private samples

if Tier=="NANOAOD":
# This is for the case for Data.
# In this case we want datasetname to be formed from the first two parts (in order to distinguish years/Eras)
dsname = dataset.strip().split("/")[1] + "_" + dataset.split("/")[2]
instance = "prod/global"
if Tier == "USER":
Expand Down Expand Up @@ -107,7 +109,7 @@ def getFilesFromPath(args, lim=None):
def getRootFilesFromPath(d, lim=None):
import subprocess

if "xrootd" in d:
if "root://" in d:
sp = d.split("/")
siteIP = "/".join(sp[0:4])
pathToFiles = "/".join(sp[3:]) + "/"
Expand Down
10 changes: 6 additions & 4 deletions plotting/comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,14 @@

## If addition rescale on yields required
if "rescale_yields" in config.keys():
# print(config["rescale_yields"])
for sample_to_scale in config["rescale_yields"].keys():
print(
f"Rescale {sample_to_scale} by {config['rescale_yields'][sample_to_scale]}"
)
collated = additional_scale(
collated, config["rescale_yields"][sample_to_scale], sample_to_scale
)

collated = additional_scale(collated, config["rescale_yields"])

### style settings
if "Run" in list(config["reference"].keys())[0]:
hist_type = "errorbar"
Expand Down Expand Up @@ -146,7 +147,8 @@
ax.set_xlabel(None)
ax.set_ylabel("Events")
rax.set_ylabel("Other/Ref")
ax.ticklabel_format(style="sci", axis='y', scilimits=(-3, 3))
ax.ticklabel_format(style="sci", axis="y", scilimits=(-3, 3))

ax.get_yaxis().get_offset_text().set_position((-0.065, 1.05))
ax.legend()
rax.set_ylim(0.0, 2.0)
Expand Down
5 changes: 2 additions & 3 deletions plotting/plotdataMC.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,8 @@
print(
f"Rescale {sample_to_scale} by {config['rescale_yields'][sample_to_scale]}"
)
collated = additional_scale(
collated, config["rescale_yields"][sample_to_scale], sample_to_scale
)

collated = additional_scale(collated, config["rescale_yields"])

## collect variable lists
if "all" in list(config["variable"].keys())[0]:
Expand Down
63 changes: 43 additions & 20 deletions src/BTVNanoCommissioning/helpers/func.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,42 +300,65 @@ def update(events, collections):
def num(ar):
return ak.num(ak.fill_none(ar[~ak.is_none(ar)], 0), axis=0)


def _is_rootcompat(a):
"""Is it a flat or 1-d jagged array?"""
t = ak.type(a)
if isinstance(t, ak._ext.ArrayType):
if isinstance(t.type, ak._ext.PrimitiveType):
return True
if isinstance(t.type, ak._ext.ListType) and isinstance(t.type.type, ak._ext.PrimitiveType):
if isinstance(t.type, ak._ext.ListType) and isinstance(
t.type.type, ak._ext.PrimitiveType
):
return True
return False

def uproot_writeable(events,include=["events","run","luminosityBlock"]):

def uproot_writeable(events, include=["events", "run", "luminosityBlock"]):
ev = {}
include = np.array(include)
no_filter = False

if len(include)==1 and include[0] == "*" : no_filter = False

if len(include) == 1 and include[0] == "*":
no_filter = False
for bname in events.fields:
if not events[bname].fields:
if not no_filter and bname not in include:continue
if not no_filter and bname not in include:
continue
ev[bname] = ak.packed(ak.without_parameters(events[bname]))
else:
b_nest={}
b_nest = {}
no_filter_nest = False
if all(np.char.startswith(include,bname)==False):continue
include_nest = [i[i.find(bname)+len(bname)+1:] for i in include if i.startswith(bname)]

if len(include_nest)==1 and include_nest[0]=="*":no_filter_nest=True
if not no_filter_nest:
mask_wildcard=np.char.find(include_nest,"*")!=-1
include_nest=np.char.replace(include_nest,"*","")

if all(np.char.startswith(include, bname) == False):
continue
include_nest = [
i[i.find(bname) + len(bname) + 1 :]
for i in include
if i.startswith(bname)
]

if len(include_nest) == 1 and include_nest[0] == "*":
no_filter_nest = True
if not no_filter_nest:
mask_wildcard = np.char.find(include_nest, "*") != -1
include_nest = np.char.replace(include_nest, "*", "")

for n in events[bname].fields:
if not _is_rootcompat(events[bname][n]):continue
## make selections to the filter case, keep cross-ref ("Idx")
if not no_filter_nest and all(np.char.find(n,include_nest)==-1) and "Idx" not in n:continue
if mask_wildcard[np.where(np.char.find(n,include_nest)!=-1)]== False and "Idx" not in n:continue
b_nest[n]=ak.packed(ak.without_parameters(events[bname][n]))
if not _is_rootcompat(events[bname][n]):
continue
## make selections to the filter case, keep cross-ref ("Idx")
if (
not no_filter_nest
and all(np.char.find(n, include_nest) == -1)
and "Idx" not in n
):
continue
if (
mask_wildcard[np.where(np.char.find(n, include_nest) != -1)]
== False
and "Idx" not in n
):
continue
b_nest[n] = ak.packed(ak.without_parameters(events[bname][n]))
ev[bname] = ak.zip(b_nest)
return ev
return ev
Loading

0 comments on commit 2234e91

Please sign in to comment.