Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Invalidations not propagated to files #113

Open
amanrique1 opened this issue May 30, 2024 · 4 comments
Open

Dataset Invalidations not propagated to files #113

amanrique1 opened this issue May 30, 2024 · 4 comments

Comments

@amanrique1
Copy link

While analyzing Rucio and DBS inconsistencies, the DM team discovered many valid files whose datasets were declared as invalid or access type deleted.
Is there any reason for this behavior?
Some examples are

+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+---------------+------------------+------------------------+
|f_logical_file_name                                                                                                                                    |d_dataset                                                                                  |f_is_file_valid|d_is_dataset_valid|d_dataset_access_type_id|
+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------+---------------+------------------+------------------------+
|/store/mc/2007/10/9/CSA07-rs1gg_750GeV_c01-2840/0006/CAF9ADDB-0C8B-DC11-9D12-0019B9E4F84F.root                                                         |/rs1gg_750GeV_c01/CMSSW_1_6_0-CSA07-2840/GEN-SIM-DIGI-RAW                                  |1              |1                 |81   
|/store/mc/GEM2019Upg14DR/DYToTauTau_M-20_TuneZ2star_14TeV-pythia6-tauola/GEN-SIM-RECO/Age500CaloPU50_U19_500FB_V1A-v1/60000/D8731F82-1F48-E511-884C-0025B3E05D50.root|/DYToTauTau_M-20_TuneZ2star_14TeV-pythia6-tauola/GEM2019Upg14DR-Age500CaloPU50_U19_500FB_V1A-v1/GEN-SIM-RECO|1              |0                 |2                       |
|/store/mc/GEM2019Upg14DR/DYToTauTau_M-20_TuneZ2star_14TeV-pythia6-tauola/GEN-SIM-RECO/Age500CaloPU50_U19_500FB_V1A-v1/60000/BE83C7D5-1E48-E511-BAD3-002590A83190.root|/DYToTauTau_M-20_TuneZ2star_14TeV-pythia6-tauola/GEM2019Upg14DR-Age500CaloPU50_U19_500FB_V1A-v1/GEN-SIM-RECO|1              |0                 |2                       |
|/store/mc/RunIIWinter15wmLHE/TT_13TeV-powheg/LHE/MCRUN2_71_V1_ext4-v1/30000/9A880A75-366D-E511-BFDB-38EAA7A6DBA0.root                                                |/TT_13TeV-powheg/RunIIWinter15wmLHE-MCRUN2_71_V1_ext4-v1/LHE                                                |1              |0                 |2                       |
|/store/mc/RunIIWinter15wmLHE/TT_13TeV-powheg/LHE/MCRUN2_71_V1_ext4-v1/30000/C211E47B-366D-E511-8A45-001E673D21B9.root                                                |/TT_13TeV-powheg/RunIIWinter15wmLHE-MCRUN2_71_V1_ext4-v1/LHE                                                |1              |0                 |2                       |
@vkuznet
Copy link
Contributor

vkuznet commented May 30, 2024

You may look at DNs and timestamps of these datasets and raise your concern with the data-ops team. But I doubt this issue is relevant for this repository.

@amanrique1
Copy link
Author

In data-ops, we have noticed many cases where PnR invalidates datasets[1], and the files stay valid. The update of these inner files can be done on our side, but it would be more straightforward to do it directly on DBS when invalidating the dataset, at least for future cases.

[1] https://cmsweb.cern.ch/das/request?input=dataset%3D%2FMuon%2FRun2022D-HcalCalIterativePhiSym-27Jun2023-v1%2FALCARECO&instance=prod/global

@todor-ivanov
Copy link

todor-ivanov commented Jun 3, 2024

hi @amanrique1 so far it has always been the responsibility for the people who invalidate the files in the data management system, to take the relevant actions in the book keeping system as well. In the past it was mostly done by PnR. I agree we may benefit a lot from a system where we can automate this process and keep the two system better in sync, but if it is not supposed to be done in the tools used by the relevant team to do the invalidation in parallel and this functionality requirement gets pushed upstream to the server side, then the project is much bigger than just automating an action. It may even involve cross database checks between Rucio and DBS ... This kind of a project has been discussed in the past and is in our radar for sure.

@amanrique1
Copy link
Author

Hi @todor-ivanov, I think I didn't explain myself well. The idea of this ticket is to get internal DBS consistency. If PnR invalidates a dataset, all its files get invalid as well.
In Data Ops, we are working on a separate project for the Rucio-DBS sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants