New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

achiefa · 2024-12-09T17:06:16Z

This PR implements CMS_WCHARM_13TEV_WPWM-TOT-UNNORM in the new format.

General comments

This dataset delivers the differential distribution in function of the absolute rapidity of the lepton pair. Each data point is accompanied by a (symmetric) statistical uncertainty and a (asymmetric) systematical uncertainty. The latter is the sum in quadrature of the different sources of uncertainty. The breakdown of these systematic sources is not delivered in the HepData format, but it is given in Table 1 of the paper.

The legacy version has the variant sys_10, which should not be implemented because it was meant to account for the 3pt prescription.

$(x,Q^2)$-map and data-theory comparison

Legacy: [default],
New: [default w/ shifts], [default w/o shifts]

achiefa · 2024-12-10T11:29:03Z

Hi @RoyStegeman , I think I'm done here. Please, see the following comments:

In this dataset, uncertainties must be symmetrised. This means that the central data must be shifted accordingly. However, I suspect that the old implementation didn't have the data shifted. I produced two comparisons (see description), w/ and w/o shift, respectively. The one that does not implement the shift gives the same chi2 as the legacy version. You can also check by naked eye that the non-shifted data are closer to the legacy data.
I also believe that the old implementation rounded some data, and indeed the central data in HepData do not match with the legacy implementation.
I can't reproduce the legacy covmat, but it might be that the cause is one or both of the differences above. Below, you can find the two matrices.

Honestly, I can't judge whether these differences are relevant or not. The difference in chi2 is not negligible if one accounts for the shifts. On the other hand, the difference in the t0 matrices does not really worry me as I was able to reproduce the chi2 of the legacy implementation provided shifts were removed. @RoyStegeman, what do you think? Maybe it is worth asking @enocera.

RoyStegeman · 2024-12-10T13:07:55Z

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

achiefa · 2024-12-10T13:09:09Z

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

No, maybe @enocera does.

RoyStegeman

I'm not so familiar with dataset implementations so this is going to take me some time to figure out...

For now I just have a question regarding the Extractor class. There are also a lot of unused imports, are you using an lsp?

RoyStegeman · 2024-12-10T13:24:49Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/filter.py

+import numpy as np
+import yaml


Suggested change

import numpy as np

import yaml

RoyStegeman · 2024-12-10T13:34:12Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/filter_utils.py

+import os
+
+import numpy as np
+import pandas as pd


Suggested change

import pandas as pd

RoyStegeman · 2024-12-10T13:34:49Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/filter_utils.py

+import os
+
+import numpy as np
+import pandas as pd


Suggested change

import pandas as pd

RoyStegeman · 2024-12-10T13:37:51Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/filter_utils.py

+SYS_UNC_by_bin = [{}]
+
+
+class Extractor:


Some of this stuff seems pretty universal. Would it be worth defining a base class that is shared between datasets?

I've been thinking about making this class more universal. However, I gave up because there are many differences between datasets, even within the same experiment. If we want to use a more universal extractor, then we should all agree on standard common specifics amongst datasets. For now, the extractor class is rather specialised to the datasets that I implemented.

RoyStegeman · 2024-12-10T13:38:39Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/filter_utils.py

@@ -0,0 +1,304 @@
+import logging
+import os


Suggested change

import os

RoyStegeman · 2024-12-10T13:42:46Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/metadata.yaml

-  kinematic_coverage:
-  - k1
-  - k2
-  - k3
+    plot_x: abs_eta
+  kinematic_coverage: [abs_eta, m_W2]


Why do the legacy kinematics have three variables and the new implementation just two? And does the legacy dataset not use this metadatafile and thus cause issues if the kinematics don't match?

From the value it seems the removed variable was just supposed to indicate the 13TeV beam energy, which I don't think would be used anywhere. But this question is more about how the code deals with it.

The third variable in the old implementation was the beam energy. However, this is optional in the new common data parser as the $\sqrt{s}$ is automatically read from the name of the dataset. In other words, using sqrts here would be redundant.

The legacy dataset does use this metadata file. However, since the kinematics is the same between the two versions, the legacy file can be removed (and I will).

RoyStegeman · 2024-12-10T13:48:22Z

nnpdf_data/nnpdf_data/commondata/CMS_WCHARM_13TEV/sys_uncertainties.py

@@ -0,0 +1,143 @@
+import numpy as np


Suggested change

import numpy as np

achiefa · 2024-12-10T14:05:10Z

There are also a lot of unused imports, are you using an lsp?
These were copy and paste. My bad. BTW, what is a lsp?

RoyStegeman · 2024-12-10T14:06:59Z

language server protocol. It's the software that highlights tokens based on their role in the python syntax. Including unused imports

I'm pretty sure you are using it, but just in case you're not

achiefa · 2024-12-10T14:08:58Z

Oh, then I am. But I just forgot to delete the unused imports.

enocera · 2024-12-10T15:06:24Z

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

No, maybe @enocera does.

I think that the reason is as follows: W+c data were not included in NNDPF4.0 because, at that time, NNLO corrections to the matrix elements were not known. When the MHOU, QED, and aN3LO determinations were produced, the leitmotiv was to put them on the same grounds as NNPDF4.0. Therefore W+c did not go into them. At some point I raised the question whether we should include it (in the same way as we do, e.g. for LHC data in the N3LO fit). Initially their answer was yes, but then they retracted. So I suspect that Andrea started to compute the FK tables, but then stopped.

achiefa · 2024-12-12T15:08:49Z

According to what ERN said in the last code meeting, this one is also ready for review.

achiefa requested a review from RoyStegeman December 9, 2024 17:06

achiefa marked this pull request as draft December 9, 2024 17:06

achiefa mentioned this pull request Dec 9, 2024

Final revision of the 4.0 dataset #2242

Open

5 tasks

achiefa self-assigned this Dec 9, 2024

achiefa added the data toolchain label Dec 9, 2024

achiefa marked this pull request as ready for review December 10, 2024 11:43

RoyStegeman reviewed Dec 10, 2024

View reviewed changes

achiefa added 9 commits December 12, 2024 15:07

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM

fa9afc9

Implementation in the new format - WIP

ce22f26

Correct metadata

933b287

Correct bug in filter

6e1a6ba

Correct percentage for lumi description

cc42824

Comment out unused bin

a17cb77

Remove sys_10 variant

013d37f

Correct bug in data generation

0dd371c

Correct order for shifts

342e1f5

achiefa force-pushed the new_CMS_WCHARM_13TEV_WPWM-TOT-UNNORM branch from f054206 to 342e1f5 Compare December 12, 2024 15:07

achiefa added 2 commits December 18, 2024 11:53

Clean code + remove unused code

8c2a235

Add docstring in sys_uncertainties.py

5ce9c42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

achiefa commented Dec 9, 2024 •

edited

Loading

achiefa commented Dec 10, 2024

RoyStegeman commented Dec 10, 2024

achiefa commented Dec 10, 2024

RoyStegeman left a comment

RoyStegeman Dec 10, 2024

RoyStegeman Dec 10, 2024

RoyStegeman Dec 10, 2024

RoyStegeman Dec 10, 2024

achiefa Dec 10, 2024

RoyStegeman Dec 10, 2024

RoyStegeman Dec 10, 2024

achiefa Dec 10, 2024

RoyStegeman Dec 10, 2024

achiefa commented Dec 10, 2024

RoyStegeman commented Dec 10, 2024 •

edited

Loading

achiefa commented Dec 10, 2024

enocera commented Dec 10, 2024

achiefa commented Dec 12, 2024

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

Are you sure you want to change the base?

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

Conversation

achiefa commented Dec 9, 2024 • edited Loading

General comments

$(x,Q^2)$-map and data-theory comparison

achiefa commented Dec 10, 2024

RoyStegeman commented Dec 10, 2024

achiefa commented Dec 10, 2024

RoyStegeman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achiefa commented Dec 10, 2024

RoyStegeman commented Dec 10, 2024 • edited Loading

achiefa commented Dec 10, 2024

enocera commented Dec 10, 2024

achiefa commented Dec 12, 2024

achiefa commented Dec 9, 2024 •

edited

Loading

RoyStegeman commented Dec 10, 2024 •

edited

Loading