Polish analysis of a multiclosure test #1950

andreab1997 · 2024-02-20T14:56:24Z

Here we collect new functions and template that allow the analysis of multiclosure tests.
@comane @giovannidecrescenzo @mariaubiali

scarlehoff · 2024-02-20T15:06:23Z

If you can do this as a branch off the new data reader that would be immensely helpful :__

and it should also work for everything other than ttbar and jet/dijets ...

andreab1997 · 2024-02-20T15:12:19Z

If you can do this as a branch off the new data reader that would be immensely helpful :__

and it should also work for everything other than ttbar and jet/dijets ...

I would do it but we actually need jets and ttbar

…t_analysis

comane · 2024-02-22T13:28:02Z

validphys2/src/validphys/kinematics.py

+    import matplotlib.pyplot as plt
+    import matplotlib.colors


I think we should use validphys.subplots instead

comane · 2024-03-04T15:27:54Z

validphys2/src/validphys/closuretest/inconsistent_closuretest/multiclosure_inconsistent.py

+
+import numpy as np
+from sklearn.decomposition import PCA
+from sklearn import preprocessing


Suggested change

from sklearn import preprocessing

Seems to be unused

comane · 2024-03-04T15:30:43Z

validphys2/src/validphys/closuretest/inconsistent_closuretest/multiclosure_inconsistent.py

+    # fits_dataset_predictions = [
+    #     ThPredictionsResult.from_convolution(pdf, dataset) for pdf in fits_pdf
+    # ]
+
+    # dimensions here are (Nfits, Ndat, Nrep)
+    # reps = np.asarray([th.error_members for th in fits_dataset_predictions])
+
+    # reshape so as to get PCs from all the samples
+    # reps = reps.reshape(reps.shape[1],-1)
+


Suggested change

# fits_dataset_predictions = [

# ThPredictionsResult.from_convolution(pdf, dataset) for pdf in fits_pdf

# ]

# dimensions here are (Nfits, Ndat, Nrep)

# reps = np.asarray([th.error_members for th in fits_dataset_predictions])

# reshape so as to get PCs from all the samples

# reps = reps.reshape(reps.shape[1],-1)

comane · 2024-03-04T15:34:31Z

validphys2/src/validphys/closuretest/inconsistent_closuretest/multiclosure_inconsistent.py

+    # rescale feature matrix
+    reps_scaled = reps  # preprocessing.scale(reps)
+
+    # choose number of principal components (PCs) based on explained variance ratio
+    n_comp = 1
+    for _ in range(reps.shape[0]):
+        pca = PCA(n_comp).fit(reps_scaled.T)
+        if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio:
+            break
+        n_comp += 1


Suggested change

# rescale feature matrix

reps_scaled = reps # preprocessing.scale(reps)

# choose number of principal components (PCs) based on explained variance ratio

n_comp = 1

for _ in range(reps.shape[0]):

pca = PCA(n_comp).fit(reps_scaled.T)

if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio:

break

n_comp += 1

# choose number of principal components (PCs) based on explained variance ratio

n_comp = 1

for _ in range(reps.shape[0]):

pca = PCA(n_comp).fit(reps.T)

if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio:

break

n_comp += 1

I think that we can remove this. Or otherwise set preprocessing as an option. The idea was to rescale the feature matrix so as to have it unit free. I am not sure however, whether we really want to have it as we compute the PCA for each dataset separately

comane · 2024-03-04T15:37:17Z

validphys2/src/validphys/closuretest/inconsistent_closuretest/multiclosure_inconsistent.py

+    if n_comp <=1:
+        return None, None, n_comp


This is probably a bit ugly, but it's the way I had found to deal either with datasets that have one datapoint only or with those that for the given explained variance ratio shrink to 1.
I would be happy to discuss on this

comane · 2024-03-04T15:45:37Z

Hi @andreab1997, is it fine for you if I commit to this branch?
This way I can add some of the modifications I am suggesting plus other stuff (such as docstrings and further plotting runcards)

andreab1997 · 2024-03-04T15:52:18Z

Hi @andreab1997, is it fine for you if I commit to this branch? This way I can add some of the modifications I am suggesting plus other stuff (such as docstrings and further plotting runcards)

Yes fell free. Maybe first we want to rebase this on the new commondata reader, as @scarlehoff was suggesting

comane · 2024-03-04T17:33:48Z

@andreab1997, @scarlehoff I am trying to rebase on master but I get a series of conflict that I don't understand too much.
In particular, I get a conflict with multiclosure_inconsistent_output.py which should not even be in master

comane · 2024-03-05T11:59:56Z

If you can do this as a branch off the new data reader that would be immensely helpful :__
and it should also work for everything other than ttbar and jet/dijets ...

I would do it but we actually need jets and ttbar

@scarlehoff Is this still an issue?

@andreab1997
I did rebase this branch on master by git cherry picking every relevant commit from here (rebasing was too messy I think).
The new branch name is 240305_multict_analysis

If the one above is not an issue please feel free to close this PR and open one for this new branch.

scarlehoff · 2024-03-05T13:10:53Z

@scarlehoff Is this still an issue?

No, @andreab1997 fixed it. And on the process of fixing it we found out that some of the values were wrong... (I think for dijet, the value of Q had an extra square or an extra square root, I don't remember right now). It should be now fixed in master.

comane · 2024-03-05T16:15:26Z

@scarlehoff Is this still an issue?

No, @andreab1997 fixed it. And on the process of fixing it we found out that some of the values were wrong... (I think for dijet, the value of Q had an extra square or an extra square root, I don't remember right now). It should be now fixed in master.

Ok, great, thanks for that!
Then I think it's better if we close this PR and use the new branch instead as that one is rebased on master.

Init analysis

91b1ee5

andreab1997 added 10 commits February 20, 2024 16:15

Add other functions

ab5d6d1

Add other functions again

f9c5ade

Add parse_variancepdf

813f019

Init analysis

7f8229f

Add other functions

36863c3

Add other functions again

273e16b

Add parse_variancepdf

3c1f113

Merge branch 'multict_analysis' of github.com:NNPDF/nnpdf into multic…

3eb0a85

…t_analysis

Remove ipdbs and make sure multiclosure_inconsistent work

af4a08f

Add multiclosure comparefits

0b1e1e4

comane reviewed Feb 22, 2024

View reviewed changes

andreab1997 added 4 commits February 26, 2024 10:48

Add sklearn to pyproject

1337476

Scikit added to conda

dfa13a2

Correct conda recipe

7b1183f

Correct conda recipe

a4eabd8

comane reviewed Mar 4, 2024

View reviewed changes

comane force-pushed the multict_analysis branch from 20ee788 to a4eabd8 Compare March 4, 2024 17:24

comane closed this Mar 5, 2024

comane deleted the multict_analysis branch March 5, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish analysis of a multiclosure test #1950

Polish analysis of a multiclosure test #1950

andreab1997 commented Feb 20, 2024

scarlehoff commented Feb 20, 2024

andreab1997 commented Feb 20, 2024

comane Feb 22, 2024

comane Mar 4, 2024

comane Mar 4, 2024

comane Mar 4, 2024

comane Mar 4, 2024

comane commented Mar 4, 2024

andreab1997 commented Mar 4, 2024

comane commented Mar 4, 2024

comane commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

comane commented Mar 5, 2024

Polish analysis of a multiclosure test #1950

Polish analysis of a multiclosure test #1950

Conversation

andreab1997 commented Feb 20, 2024

scarlehoff commented Feb 20, 2024

andreab1997 commented Feb 20, 2024

comane Feb 22, 2024

Choose a reason for hiding this comment

comane Mar 4, 2024

Choose a reason for hiding this comment

comane Mar 4, 2024

Choose a reason for hiding this comment

comane Mar 4, 2024

Choose a reason for hiding this comment

comane Mar 4, 2024

Choose a reason for hiding this comment

comane commented Mar 4, 2024

andreab1997 commented Mar 4, 2024

comane commented Mar 4, 2024

comane commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

comane commented Mar 5, 2024