-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polish analysis of a multiclosure test #1950
Conversation
If you can do this as a branch off the new data reader that would be immensely helpful :__ and it should also work for everything other than ttbar and jet/dijets ... |
I would do it but we actually need jets and ttbar |
import matplotlib.pyplot as plt | ||
import matplotlib.colors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use validphys.subplots instead
|
||
import numpy as np | ||
from sklearn.decomposition import PCA | ||
from sklearn import preprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from sklearn import preprocessing |
Seems to be unused
# fits_dataset_predictions = [ | ||
# ThPredictionsResult.from_convolution(pdf, dataset) for pdf in fits_pdf | ||
# ] | ||
|
||
# dimensions here are (Nfits, Ndat, Nrep) | ||
# reps = np.asarray([th.error_members for th in fits_dataset_predictions]) | ||
|
||
# reshape so as to get PCs from all the samples | ||
# reps = reps.reshape(reps.shape[1],-1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# fits_dataset_predictions = [ | |
# ThPredictionsResult.from_convolution(pdf, dataset) for pdf in fits_pdf | |
# ] | |
# dimensions here are (Nfits, Ndat, Nrep) | |
# reps = np.asarray([th.error_members for th in fits_dataset_predictions]) | |
# reshape so as to get PCs from all the samples | |
# reps = reps.reshape(reps.shape[1],-1) |
# rescale feature matrix | ||
reps_scaled = reps # preprocessing.scale(reps) | ||
|
||
# choose number of principal components (PCs) based on explained variance ratio | ||
n_comp = 1 | ||
for _ in range(reps.shape[0]): | ||
pca = PCA(n_comp).fit(reps_scaled.T) | ||
if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio: | ||
break | ||
n_comp += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# rescale feature matrix | |
reps_scaled = reps # preprocessing.scale(reps) | |
# choose number of principal components (PCs) based on explained variance ratio | |
n_comp = 1 | |
for _ in range(reps.shape[0]): | |
pca = PCA(n_comp).fit(reps_scaled.T) | |
if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio: | |
break | |
n_comp += 1 | |
# choose number of principal components (PCs) based on explained variance ratio | |
n_comp = 1 | |
for _ in range(reps.shape[0]): | |
pca = PCA(n_comp).fit(reps.T) | |
if np.sum(pca.explained_variance_ratio_) >= explained_variance_ratio: | |
break | |
n_comp += 1 |
I think that we can remove this. Or otherwise set preprocessing as an option. The idea was to rescale the feature matrix so as to have it unit free. I am not sure however, whether we really want to have it as we compute the PCA for each dataset separately
if n_comp <=1: | ||
return None, None, n_comp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a bit ugly, but it's the way I had found to deal either with datasets that have one datapoint only or with those that for the given explained variance ratio shrink to 1.
I would be happy to discuss on this
Hi @andreab1997, is it fine for you if I commit to this branch? |
Yes fell free. Maybe first we want to rebase this on the new commondata reader, as @scarlehoff was suggesting |
@andreab1997, @scarlehoff I am trying to rebase on master but I get a series of conflict that I don't understand too much. |
@scarlehoff Is this still an issue? @andreab1997 If the one above is not an issue please feel free to close this PR and open one for this new branch. |
No, @andreab1997 fixed it. And on the process of fixing it we found out that some of the values were wrong... (I think for dijet, the value of Q had an extra square or an extra square root, I don't remember right now). It should be now fixed in master. |
Ok, great, thanks for that! |
Here we collect new functions and template that allow the analysis of multiclosure tests.
@comane @giovannidecrescenzo @mariaubiali