Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about clustering results #25

Open
gcasamat opened this issue Dec 1, 2023 · 3 comments
Open

Question about clustering results #25

gcasamat opened this issue Dec 1, 2023 · 3 comments

Comments

@gcasamat
Copy link

gcasamat commented Dec 1, 2023

Hi,

Thanks for this fantastic package!
I have an issue related to the output of the clustering algorithm.
I have runned the following code in python:

os.system("java -cp /Applications/networkanalysis/networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering"
                      " -n AssociationStrength -r 1 -m 20 --sorted-edge-list"
                      " -o " "clusters.txt"
                      " data_net.txt")

The output message announces 829 clusters and 760 clusters after removing clusters consisting of fewer than 20 nodes.
However when I open the file clusters.txt:

  • the number of clusters is 681 (the clusters number from 0 to 759 with holes).
  • many clusters contain less than 20 nodes.

Many thanks in advance for your explanation.

@vtraag
Copy link
Contributor

vtraag commented Dec 14, 2023

Could you please provide a minimal reproducible example? Then we might be able to debug any problem. Without being able to replicate the problem, we also cannot solve it.

@gcasamat
Copy link
Author

You can find below some input and output txt files for replicating the issue.

The command I execute is:

java -cp /Applications/networkanalysis/networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering
                      -n AssociationStrength -r 50 -m 50 --sorted-edge-list
                      -o net_clusters_res50.txt data_net.txt

The output message from networkanalysis is:

Quality function: CPM
Normalization method: AssociationStrength
Resolution parameter: 50.0
Minimum cluster size: 50
Number of random starts: 1
Number of iterations: 10
Randomness parameter: 0.01
Random number generator seed: random
Running algorithm took 0s.
Quality function equals 0.9256850092525903.
Clustering consists of 1354 clusters.
Removing clusters consisting of fewer than 50 nodes.
Final clustering consists of 1353 clusters.

However, I count 1018 clusters in the file net_clusters_res50.txt, with many clusters less than 50 items.

Thanks for your help.

data_net.txt
net_clusters_res50.txt

@vtraag
Copy link
Contributor

vtraag commented Dec 19, 2023

There are two separate issues here:

  1. Communities are not consecutively numbered.
  2. Clusters may have less nodes than indicated by the threshold.

The first item should be solved, I've opened a PR in #27 for this.

The second item cannot be solved in this case. That is, your network contains several components (1006, to be precise). The algorithm will never create clusters larger than the individual components. This will not be changed.

It might be a possibility to check connected components and provide a warning if the connected components are smaller than the minimum desired community size. However, this also means that more time is spent in checking this, so there should at least be an option to turn it off. What do you think @neesjanvaneck ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants