Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is meant by this?: "NOTE: though the clusters match the R output, the cluster names are shuffled" #6

Open
jolespin opened this issue Sep 28, 2018 · 1 comment

Comments

@jolespin
Copy link

jolespin commented Sep 28, 2018

First off, thank you so much for creating this package. I've been needing something like this and wrote a wrapper with rpy2 but will use this instead from now on.

I am noticing some inconsistencies:
(1) The clustering looks the same but the colors from WGCNA seem to repeat themselves. Is this a WGCNA issue or an artifact of your note:

dynamicTreeCut contains methods for detection of clusters in hierarchical clustering dendrograms. "NOTE: though the clusters match the R output, the cluster names are shuffled"

(2) Can you elaborate on what you mean by the quote above? I keep getting clusters that overlap when I plot them ...

Here is the link to the dataframe:
https://drive.google.com/file/d/1vp_jx8CfD90bvFcS6sbWN59U-_DQa48L/view?usp=sharing

Here's my Rcode:

library(dynamicTreeCut)
library(fastcluster)
library(WGCNA)

# Read in dataframe
read_dataframe = function(path, sep="\t") {
  df = read.table(path, sep=sep, row.names=1, header = TRUE, check.names=FALSE)
  return(df)
}
df_adj = read_dataframe("~/adj.tsv")

# Convert to dissimilarity
df_dism = 1 - df_adj

# Compute hierarchical clustering linkage
Z = hclust(as.dist(df_dism), method="ward.D2")

# Cut the dendrogram
treecut_output = cutreeDynamic(
  dendro=Z, 
  method="hybrid", 
  distM=df_dism, 
  minClusterSize = 10,
  deepSplit=2,
)

# Plot dendrogram
plotDendroAndColors(
  dendro=Z,
  colors=treecut_output,
)

image

Here is my python representation using the same parameters:
image

@jolespin jolespin changed the title What is meant by this: dynamicTreeCut contains methods for detection of clusters in hierarchical clustering dendrograms. "NOTE: though the clusters match the R output, the cluster names are shuffled" What is meant by this?: "NOTE: though the clusters match the R output, the cluster names are shuffled" Sep 28, 2018
@Linvill
Copy link

Linvill commented Dec 23, 2019

First of all, also from my side, thank you very much for putting this package together! I am intending to use the package for the clustering of pico-earthquakes based on waveform similarity.

To get to know the package, I am currently trying to replicate your example here @jolespin. Assuming that your "R-generated solution" above is correct, I am a little confused, because it seems that your first (black) cluster seems to be different, than the one in the "Python-generated solution". Also in my replicated representation using this package, the black cluster does not appear as you have it in your "R-generated solution" (see below).

Trying to figure out what the cause for that could be, I may just think of the linkage method used in R ("ward.D2") compared to scipy ("ward"). Ultimately, I belief its not the cause because absolute linkage values (y-axis in the above plots) do match. Would anybody have an idea on the cause for this mismatch?

On further, more general question: Where are the leafs, which are not assigned to clusters because "minClusterSize = 10" is too large? Are these the black (R-generated solution) or red (Python-generated solution) leafs?

Test_example_deepsplit2

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants