-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reordering function to assess feature significance? #130
Comments
Implementation in Python: # compute the absolute correlation matrix between the original and resampled
# v weights
def cor_matrix(X,Y):
X = (X - X.mean(axis=0)) / X.std(axis=0)
Y = (Y - Y.mean(axis=0)) / Y.std(axis=0)
cor_matrix = np.dot(X.T, Y) / X.shape[0]
return cor_matrix
def reorderCCA(v_org,v_res,u_org,u_res,cors_org,cors_res):
cor_matrix_abs = np.abs(cor_matrix(v_org,v_res))
res_match = np.argmax(cor_matrix_abs,axis=1)
res_cor = np.amax(cor_matrix_abs,axis=1)
res_match_count = len(np.unique(res_match))
# now reorder the u and v weights from the resampled CCA based on the correlation matches
u_res_ordered = u_res[:,res_match]
v_res_ordered = v_res[:,res_match]
cors_final = cors_res[res_match]
# for each v variate get the 'mean' sign for both original and resampled v
signs_org = np.sign(np.mean(np.sign(v_org),axis=0))
signs_res = np.sign(np.mean(np.sign(v_res_ordered),axis=0))
# compute element wise product
signs_prod = (signs_org * signs_res).reshape(-1,1)
# change the signs of u and v of the resampled cca
u_final = (u_res_ordered.T * signs_prod).T
v_final = (v_res_ordered.T * signs_prod).T
# if you can't exclusively assign one resampled v to one original v, return
# nans
if res_match_count < u_org.shape[1]:
u_na = np.empty(u_org.shape)
u_na[:] = np.nan
v_na = np.empty(v_org.shape)
v_na[:] = np.nan
cors_na = np.repeat(np.nan,u_org.shape[1])
res_match_na = np.repeat(np.nan,u_org.shape[1])
res_cor_na = np.repeat(np.nan,u_org.shape[1])
res_one_reorder = [u_na,v_na,cors_na,res_match_na,res_cor_na]
else:
res_one_reorder = [u_final,v_final,cors_final,res_match,res_cor]
return res_one_reorder |
I am not sure if this function makes sense. Xia et al. refer to this paper (Mišić et al.)
But here Procrustes Rotations are used to match resampled to original variates. I am not sure if the function above from Xia et al. is Procrustes Rotation? |
I guess the whole issue could also be renamed to: "Implement Procrustes Rotation for resampling purposes (bootstrapping, permutation)". For completeness: Here's the implementation from pyls: This package might help: Maybe also this: |
Any update on this? I might start to work on this with my colleagues on this :) Also posted a question on this on CrossValidated |
Apologies for never having responded! Must have caught me at a busy time. Not done anything myself, as ever welcome any contributions that are of practical use to people :) |
Spent some time making sure everything works with scikit-learn permutation testing stuff so that might be relevant/helpful |
In a nutshell: If we are only considering one variate, we can simply take the absolute values of the loadings/weights. But once you got more variates, you may end up in situations where you have a different order of the variates but still similar weights (similar relationship of variables when only considering each variate in isolation). In this case, Procrustes rotation may help. Not sure, however, what some statisticians would say about this because a different order of variates might inherently also be an indicator of bad reliability. |
In their paper from 2018, Xia et a. implemented a method to match canonical variates from resampled data sets to the original data set in order to be able to compute p-values for their canonical weights.
Page 12 (Methods Section):
Here's the code they implemented to achieve this:
https://github.com/cedricx/sCCA/blob/d5a2f4cb071bddd3f7d805e02ff27828b8494c66/sCCA/code/final/cca_functions.R#L191
Would it make sense to implement this method for
cca-zoo
? I am not even sure if this is a 'good' method having this issue in mind? But if I got it right, it's one thing to assess the overall significance of the canonical variates themselves and another thing to assess the significance of the feature weights on the canonical variates?The text was updated successfully, but these errors were encountered: