-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Hierarchical clustering of the correlation matrix #19
base: main
Are you sure you want to change the base?
Conversation
…he correlation matrix before plotting it
c66f057
to
a373e37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. main caveat is whether pandas is not an overkill
mriqc_learn/viz/metrics.py
Outdated
# Build a new dataframe with the sorted columns | ||
for idx, i in enumerate(data.columns[labels_order]): | ||
if idx == 0: | ||
clustered = pd.DataFrame(data[i]) | ||
else: | ||
df_to_append = pd.DataFrame(data[i]) | ||
clustered = pd.concat([clustered, df_to_append], axis=1) | ||
data = clustered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy should be sufficient to reorder, something like
# Build a new dataframe with the sorted columns | |
for idx, i in enumerate(data.columns[labels_order]): | |
if idx == 0: | |
clustered = pd.DataFrame(data[i]) | |
else: | |
df_to_append = pd.DataFrame(data[i]) | |
clustered = pd.concat([clustered, df_to_append], axis=1) | |
data = clustered | |
data = np.take(data, labels_order, axis=0) |
Q2 - don't you want to also sort the rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow very fast, thanks.
The panda implementation reorder both the rows and the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think np.take
will then work for you with something like (labels_order, labels_order)
or zip((labels_order, labels_order))
for the indexes and no axis argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, none of the suggestions work and with a quick search on internet, I couldn't figure out how to reorder both rows and columns in a np.array. I thus suggest we keep the panda implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is easier than you think:
reordered_idx = (0, 1, 2, 4, 5, 3, 6, 7, 8, 9)
data.take(indices=reordered_idx, axis=0).take(indices=reordered_idx, axis=1)
The only caveat is that you need to do the reordering on the full correlation matrix, and only after the reordering drop the upper triangle (if you want to do so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, it works with this suggestion. I really could not figure out how to do the reordering on np.array.
It indeed greatly simplifies the code.
Can I merge the PR now?
Co-authored-by: Oscar Esteban <[email protected]>
d273c1e
to
97fd28b
Compare
97fd28b
to
3f10199
Compare
33e999f
to
81867f0
Compare
@oesteban time to revive this. I think it is ready to just merge it, as I could run the code on the IQMs from the IXI dataset and since we already worked on reviews long time back. |
Implement a new feature to perform hierarchical clustering on the correlation matrix before plotting it.
The hierarchical clustering can be activated by the flag sort in plot_corrmat(), which is by default set to False, such that the default behavior of this function remains unchanged.
fix : correct the alignment of the labels on the horizontal axis