-
Notifications
You must be signed in to change notification settings - Fork 38
Automatic stageout of tape data
Stefano Belforte edited this page Jul 22, 2024
·
7 revisions
The idea is to allow users to access data on tape. Once the user submit a task we put it in the TAPERECALL state and create a rule in Rucio to get the input data on disk. When the rule is OK, the task is set in state NEW again and go through data discovery step again, this time finding data and disk and proceeding to submission.
- recall rules are created by
crab_tape_recall
Rucio account - recall rules have activity
Analysis TapeRecall
and are charged to the Rucio account corresponding to the username. In this way Rucio can keep track of how much data each user is recalling and we can set limits - recall rules are submitted with "AutomaticApproval", Rucio only enforces a global limit on
crab_tape_recall
user, enforcing of limits on each task is done by CRAB - CRAB enforces policies in executeTapeRecallPolicy method of
TaskWorker/Actions/DBSDataDiscovery
- Limit to how much data a user can have in recall at any time:
maxRecallPerUserTB
- Limit to how large a dataset can be recalled, depending on data tier:
- if datatier is in the list
tiersToRecall
: no limit - all other tiers:
maxAnyTierRecallSizeTB
- if datatier is in the list
- When dataset is too large, users have the option to give a list of blocks to recall
- Limit on recall size when providing a list of blocks:
maxTierToBlockRecallSizeTB
- Limit on recall size when providing a list of blocks:
- When user request is above limits, they are told to: "contact Data Transfer team via https://its.cern.ch/jira/browse/CMSTRANSF"
The above parameters are set in TaskWorkerConfig.py
file and can be modified via e.g. puppet template
Parameter | Current Value (July 2024) |
---|---|
tiersToRecall | ['AOD', 'AODSIM', 'MINIAOD', 'MINIAODSIM', 'NANOAOD', 'NANOAODSIM'] |
maxAnyTierRecallSizeTB | 50 |
maxTierToBlockRecallSizeTB | 50 |
maxRecallPerUserTB | 100 |
Be aware that this table may be obsolete, to know current parameter value, look at current/TaskWorkerConfig.py
in the current production TaskWorker container