Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reordering function to assess feature significance? #130

Open
JohannesWiesner opened this issue Jan 18, 2022 · 7 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@JohannesWiesner
Copy link
Contributor

In their paper from 2018, Xia et a. implemented a method to match canonical variates from resampled data sets to the original data set in order to be able to compute p-values for their canonical weights.

Page 12 (Methods Section):

As permutation could induce arbitrary axis rotation, which changes the order of canonical variates, or axis reflection, which causes a sign change for the weights, we matched the canonical variates resulting from permuted data matrices to the ones derived from the original data matrix by comparing the clinical loadings (v) (75. Mišić, B. et al. Network-level structure-function relationships in human
neocortex. Cereb. Cortex 26, 3285–96 (2016).)

Here's the code they implemented to achieve this:
https://github.com/cedricx/sCCA/blob/d5a2f4cb071bddd3f7d805e02ff27828b8494c66/sCCA/code/final/cca_functions.R#L191

Would it make sense to implement this method for cca-zoo? I am not even sure if this is a 'good' method having this issue in mind? But if I got it right, it's one thing to assess the overall significance of the canonical variates themselves and another thing to assess the significance of the feature weights on the canonical variates?

@JohannesWiesner
Copy link
Contributor Author

Implementation in Python:

# compute the absolute correlation matrix between the original and resampled 
# v weights
def cor_matrix(X,Y):
    X = (X - X.mean(axis=0)) / X.std(axis=0)
    Y = (Y - Y.mean(axis=0)) / Y.std(axis=0)
    cor_matrix = np.dot(X.T, Y) / X.shape[0]
    return cor_matrix

def reorderCCA(v_org,v_res,u_org,u_res,cors_org,cors_res):

    cor_matrix_abs = np.abs(cor_matrix(v_org,v_res))
    res_match = np.argmax(cor_matrix_abs,axis=1)
    res_cor = np.amax(cor_matrix_abs,axis=1)
    res_match_count = len(np.unique(res_match))
    
    # now reorder the u and v weights from the resampled CCA based on the correlation matches
    u_res_ordered = u_res[:,res_match]
    v_res_ordered = v_res[:,res_match]
    cors_final = cors_res[res_match]
    
    # for each v variate get the 'mean' sign for both original and resampled v
    signs_org = np.sign(np.mean(np.sign(v_org),axis=0))
    signs_res = np.sign(np.mean(np.sign(v_res_ordered),axis=0))
    
    # compute element wise product
    signs_prod = (signs_org * signs_res).reshape(-1,1)
    
    # change the signs of u and v of the resampled cca
    u_final = (u_res_ordered.T * signs_prod).T
    v_final = (v_res_ordered.T * signs_prod).T
    
    # if you can't exclusively assign one resampled v to one original v, return
    # nans
    if res_match_count < u_org.shape[1]:
        u_na = np.empty(u_org.shape)
        u_na[:] = np.nan
        v_na = np.empty(v_org.shape)
        v_na[:] = np.nan    
        cors_na = np.repeat(np.nan,u_org.shape[1])
        res_match_na = np.repeat(np.nan,u_org.shape[1])
        res_cor_na = np.repeat(np.nan,u_org.shape[1])
        res_one_reorder = [u_na,v_na,cors_na,res_match_na,res_cor_na]
    else: 
        res_one_reorder = [u_final,v_final,cors_final,res_match,res_cor]

    return res_one_reorder

@JohannesWiesner
Copy link
Contributor Author

I am not sure if this function makes sense. Xia et al. refer to this paper (Mišić et al.)

Matching Decompositions From Original and Resampled Matrices
The decompositions of both permuted and bootstrap-resampled
data matrices are not guaranteed to produce a set of LVs comparable with the ones derived from the original data matrix, because
both types of resampling could induce arbitrary axis rotation (a
change in the order of LVs) or axis reflection (a sign change for
the weights). As a result, Procrustes rotations are used to match
the resampled LVs to the original LVs (McIntosh and Lobaugh
2004).

But here Procrustes Rotations are used to match resampled to original variates. I am not sure if the function above from Xia et al. is Procrustes Rotation?

@jameschapman19 jameschapman19 added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels May 8, 2022
@JohannesWiesner
Copy link
Contributor Author

I guess the whole issue could also be renamed to: "Implement Procrustes Rotation for resampling purposes (bootstrapping, permutation)".

For completeness:

Here's the implementation from pyls:
https://github.com/netneurolab/pypyls/blob/d8a19d564cc5804249527b68c937df3a5fd8c7cc/pyls/compute.py#L240

This package might help:
https://procrustes.readthedocs.io/en/latest/

Maybe also this:
https://github.com/MaxHalford/prince#generalized-procrustes-analysis-gpa

@JohannesWiesner
Copy link
Contributor Author

Any update on this? I might start to work on this with my colleagues on this :)

Also posted a question on this on CrossValidated

@jameschapman19
Copy link
Owner

Apologies for never having responded! Must have caught me at a busy time. Not done anything myself, as ever welcome any contributions that are of practical use to people :)

@jameschapman19
Copy link
Owner

Spent some time making sure everything works with scikit-learn permutation testing stuff so that might be relevant/helpful

@JohannesWiesner
Copy link
Contributor Author

In a nutshell: If we are only considering one variate, we can simply take the absolute values of the loadings/weights. But once you got more variates, you may end up in situations where you have a different order of the variates but still similar weights (similar relationship of variables when only considering each variate in isolation). In this case, Procrustes rotation may help. Not sure, however, what some statisticians would say about this because a different order of variates might inherently also be an indicator of bad reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants