Skip to content

Processing datasets

Bruno Alves edited this page Nov 26, 2024 · 4 revisions

Prepare Big Ntuples

Clone this repository in lxplus (using release CMSSW_10_6_29). Look into branch 106X_HH_UL, under the LLRHiggsTauTau/NtupleProducer/test/ folder.

Submission with crab

  • The datasets are under NtupleProducer/test/datasets_UL18.txt (similar for other data periods). This file is picked up by NtupleProducer/test/submitAllDatasetOnCrab_LLR.py.
  • You might want to edit the script in the following places:
    • isMC flag set to True/False depending on the samples being processed
    • background MC samples potentially commented out
    • edit analyzer_LLR.py with isMC=True if needed, changing the YEAR variable
ssh lxplus # logs in to EL9 by default
cmssw-el7 # the CMSSW version of this repo is only compatible with SL7
PS1="${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[00;35m\]\w <Sing> \[\033[00m\]\$ " # improves CLI clarity
cd CMSSW_10_6_29/src/
cmsenv
scr
cd LLRHiggsTauTau/NtupleProducer/test/;
source /cvmfs/cms.cern.ch/crab3/crab.sh;
python2 submitAllDatasetOnCrab_LLR.py;
  • Visualize progression with Grafana (sign in below with CERN’s credentials)
  • For LLR, the submission outputs are stored under root://eos.grif.fr//eos/grif/cms/llr/store/user/${USER}/HHNtuples_res/ (access with gfal-ls tool).

Note #1: Make sure the isMC flag is the same in NtupleProducer/test/submitAllDatasetOnCrab_LLR.py and NtupleProducer/test/analyzer_LLR.py. Note #2: Common CRAB commands: crab submit / crab submit -d <folder> / crab status

Resubmission of failed jobs

Assuming the folder where the crab jobs were stored is crab3_Data_UL16_April2024, one can resubmit all failed jobs with:

folder=crab3_Data_UL16_April2024/; for i in $(ll ${folder} | awk '/crab_/ {print $9}'); do crab resubmit ${folder}/$i; done

EnrichedMiniAOD to LLRntuples (old instructions)

  1. testAllDatasetOnCrab.py : set path to the folder created by the CRAB submission (crab3_<tag>)
  2. python testAllDatasetOnCrab.py: will print a list of published datasets names. For missing names listed at the end, typically CRAB submission failed
  3. datasets_Enriched.txt: copy the previous list and define a block name between === <whatever> ===
  4. tools/makeAllFileLists.py: define PROCESS and tag as before
  5. cmsenv ; source /cvmfs/cms.cern.ch/crab3/crab.sh ; cd tools; python makeAllFileLists.py: will create the file list for each published sample under inputFilesEMiniAOD<tag> NOTE : this can take some time depending on how fast the CRAB server responds and might need some retries. Se proprio non va, do this by hand from DAS interface.
  6. TO BE FINISHED