Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Hierarchical clustering of the correlation matrix #19

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

celprov
Copy link
Collaborator

@celprov celprov commented Mar 24, 2023

Implement a new feature to perform hierarchical clustering on the correlation matrix before plotting it.

The hierarchical clustering can be activated by the flag sort in plot_corrmat(), which is by default set to False, such that the default behavior of this function remains unchanged.

fix : correct the alignment of the labels on the horizontal axis

@celprov celprov requested a review from oesteban March 24, 2023 10:02
@celprov celprov force-pushed the enh/clustering_corrmat branch from c66f057 to a373e37 Compare March 24, 2023 10:08
Copy link
Member

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. main caveat is whether pandas is not an overkill

mriqc_learn/viz/metrics.py Outdated Show resolved Hide resolved
mriqc_learn/viz/metrics.py Outdated Show resolved Hide resolved
mriqc_learn/viz/metrics.py Outdated Show resolved Hide resolved
Comment on lines 231 to 238
# Build a new dataframe with the sorted columns
for idx, i in enumerate(data.columns[labels_order]):
if idx == 0:
clustered = pd.DataFrame(data[i])
else:
df_to_append = pd.DataFrame(data[i])
clustered = pd.concat([clustered, df_to_append], axis=1)
data = clustered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy should be sufficient to reorder, something like

Suggested change
# Build a new dataframe with the sorted columns
for idx, i in enumerate(data.columns[labels_order]):
if idx == 0:
clustered = pd.DataFrame(data[i])
else:
df_to_append = pd.DataFrame(data[i])
clustered = pd.concat([clustered, df_to_append], axis=1)
data = clustered
data = np.take(data, labels_order, axis=0)

Q2 - don't you want to also sort the rows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow very fast, thanks.
The panda implementation reorder both the rows and the columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think np.take will then work for you with something like (labels_order, labels_order) or zip((labels_order, labels_order)) for the indexes and no axis argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, none of the suggestions work and with a quick search on internet, I couldn't figure out how to reorder both rows and columns in a np.array. I thus suggest we keep the panda implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is easier than you think:

reordered_idx = (0, 1, 2, 4, 5, 3, 6, 7, 8, 9)
data.take(indices=reordered_idx, axis=0).take(indices=reordered_idx, axis=1)

The only caveat is that you need to do the reordering on the full correlation matrix, and only after the reordering drop the upper triangle (if you want to do so).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, it works with this suggestion. I really could not figure out how to do the reordering on np.array.
It indeed greatly simplifies the code.
Can I merge the PR now?

@celprov celprov force-pushed the enh/clustering_corrmat branch from 97fd28b to 3f10199 Compare March 24, 2023 13:10
@celprov celprov requested a review from oesteban October 25, 2023 21:15
@celprov celprov force-pushed the enh/clustering_corrmat branch from 33e999f to 81867f0 Compare October 25, 2023 21:19
@celprov
Copy link
Collaborator Author

celprov commented Oct 25, 2023

@oesteban time to revive this. I think it is ready to just merge it, as I could run the code on the IQMs from the IXI dataset and since we already worked on reviews long time back.

Here is a resulting correlation matrix plot
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants