Add solutions for clique percolation #2

jamesjiang52 · 2021-12-15T21:35:02Z

This commit is for the clique percolation benchmark (https://github.com/igraph/usability-benchmarks/blob/master/tasks/tasks.md#clique-percolation).

I'm working with @dkhl65, @yanchenm, and @ChenyanWang on this issue.

…te Club graph

dkhl65 · 2021-12-15T21:36:43Z

This pull request is for igraph/python-igraph#469.

ntamas · 2021-12-15T22:06:14Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+            # If the two cliques share at least min_common_vertices,
+            # they are part of the same community
+            if c1 != c2 and len(cliques[c1].intersection(cliques[c2])) >= min_common_vertices:
+                cg_edges.append((c1, c2))


This double for loop can be simplified to a one-liner using itertools.combinations() and Python list comprehensions.

ntamas · 2021-12-15T22:07:13Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+        # Remove all connected vertices
+        for v in vertices:
+            to_merge.remove(v)
+        clique_communities.append(vertices)


I think that instead of this while loop, you can simply call clique_graph.clusters(), which gives you a VertexClustering object that has all that you need.

Basically you will need to iterate over the VertexClustering to get the connected components of the clique graph.

ntamas · 2021-12-15T22:08:51Z

Thanks a lot for your work! I have identified two places where the code can be simplified; it's good that you did it your way first because it highlights that some features already provided by igraph are not as accessible / discoverable as they should be.

ntamas

Dummy comment to make @vtraag's spurious review request go away :)

ntamas · 2021-12-16T10:09:23Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+        raise ValueError("Error: min_common_vertices=%d must be greater than k=%d" % (min_common_vertices, k))
+
+    # Each clique can be identified by it's index in 'cliques'
+    cliques = list(map(set, k_cliques(g, k)))


cliques = [set(clique) for clique in k_cliques(g, k)] is probably more readable

ntamas · 2021-12-16T10:12:40Z

@vtraag @szhorvat What's the policy for these usability benchmarks in terms of efficiency? For instance, the current version enumerates all k-cliques, creates the clique graph, and merges them, sticking to the original definition of the clique percolation problem to the letter. However, one may easily speed it up by recognizing that one can search for cliques of size at least k, because larger cliques consist of highly overlapping k-cliques anyway so they will eventually be merged in the merging step. This would also alleviate the need of definiing a separate k_clique function for searching for cliques of exactly size k because g.cliques(min_size=k) would do the job just fine.

vtraag · 2021-12-16T14:43:19Z

I think the usability benchmarks should focus on how things can be achieved most easily, not necessarily focusing on performance. Having said that, if there are things that can trivially be done in a more performant way, that should have priority. The task is simply to emulate the original clique percolation, so I think the implementation should just follow that idea.

vtraag · 2021-12-16T14:45:40Z

What I think should be improved in this case is to simplify the code as much as possible. For example, I would not include code as a __main__ function. Also the separate k_cliques function is not necessary I think.

vtraag · 2021-12-16T14:47:21Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+    cg_edges = [(c1, c2) for c1, c2 in itertools.combinations(range(num_cliques), 2) if len(cliques[c1].intersection(cliques[c2])) >= min_common_vertices]
+
+    # Add edges for clique graph
+    clique_graph.add_edges(cg_edges)


I think you would also be able to simply construct a graph directly here:

Suggested change

clique_graph.add_edges(cg_edges)

clique_graph = ig.Graph(cg_edges)

That way, you don't first need to construct a graph, and call add_vertices.

vtraag

Great, I believe we are almost there! We probably can't simplify it much further. There are a few minor remarks that I still have.

vtraag · 2021-12-16T16:04:41Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+
+EXAMPLE_K = 3
+
+def k_communities(g, k, min_common_vertices):


I would really call this clique_percolation to be consistent with the usual name.

vtraag · 2021-12-16T16:05:56Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+    Given a graph and size k, return a list of communities found through clique
+    percolation with at least min_common_vertices shared.
+    A community is the connected component of cliques where two cliques overlap
+    in at least k-1 vertices.


Suggested change

in at least k-1 vertices.

in at least min_common_vertices.

vtraag · 2021-12-16T16:06:39Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+    percolation with at least min_common_vertices shared.
+    A community is the connected component of cliques where two cliques overlap
+    in at least k-1 vertices.
+    Each community is a set of vertices in that community


I think lines 13-16 can be removed, because they don't add much in terms of explanation of the function.

vtraag · 2021-12-16T16:07:27Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+    connect vertices if the cliques share min_common_vertices
+    """
+    if (min_common_vertices > k):
+        raise ValueError("Error: min_common_vertices=%d must be greater than k=%d" % (min_common_vertices, k))


Suggested change

raise ValueError("Error: min_common_vertices=%d must be greater than k=%d" % (min_common_vertices, k))

raise ValueError(f"Error: min_common_vertices={min_common_vertices} must be greater than k={k}")

vtraag · 2021-12-16T16:10:21Z

solutions/python-igraph/clique_percolation/clique_percolation.py

+    # Merge cliques accordingly into communities
+    cg_clusters = clique_graph.clusters()
+
+    # Generate the list of clique communities as a list of set of vertices


It might be better to define communities = [] only here, because it is used only here.

szhorvat · 2021-12-16T22:17:15Z

@vtraag @szhorvat What's the policy for these usability benchmarks in terms of efficiency?

Answering the general question:

It is sometimes worth having more than one solution for the same task, each with comments about pros/cons.

The goal is to gauge the usability of igraph and compare it to other libraries. In practical use, I might go with the easiest-to-write solution first. Such a solution is worth writing down. But soon I might find that this solution does not scale, and need to rewrite it completely to something much faster, but also more complex. This solution is also worth writing down. There would be a couple of comments by the author explaining why it's worth keeping both solutions and what the tradeoff is. The comment is all we need—this is not about performance benchmarking so two solutions make sense only when there is an obvious tradeoff.

Perhaps another library has simple solutions, but no fast ones. Or it has fast solutions, but no simple ones. Or maybe the simplest solution happens to be the fastest one too. In order to make a full comparison, we will need to have more than one solution to some tasks.

For example, with the recent clique visualization task, using VertexCover was a nice choice, and it's wroth recording. But IMO it's not suitable for every use case because the cover, as drawn, does not make it clear which vertex is part of the clique and which isn't. Sometimes, the a single shaded region on the plot seems to have five vertices within, even though we are visualizing 4-cliques.

If the VertexCover solution is easy to use, and allows for compact code or for a quick-and-dirty visualization, it should be shown. But it's clearly not suitable for all use cases, so a more complicated solution that colours vertices that are part of the clique should also be shown.

I will certainly be showing multiple solutions with Mathematica because the performance / simplicity tradeoff is common. In fact, some of the functions I provide in IGraph/M make it possible to have both fast and simple code (simple was already possible with built-in Mathematica functions, but it traded off performance). This is what really illustrates one of the new possibilities that IGraph/M brings to plain Mathematica.

For instance, the current version enumerates all k-cliques, creates the clique graph, and merges them, sticking to the original definition of the clique percolation problem to the letter. However, one may easily speed it up by recognizing that one can search for cliques of size at least k, because larger cliques consist of highly overlapping k-cliques anyway so they will eventually be merged in the merging step. This would also alleviate the need of definiing a separate k_clique function for searching for cliques of exactly size k because g.cliques(min_size=k) would do the job just fine.

I didn't look at the code in this PR, but if we are looking for k-cliques, why don't we set both the min and max size to k to only obtain the correct sizes?

ntamas · 2021-12-16T22:26:24Z

I didn't look at the code in this PR, but if we are looking for k-cliques, why don't we set both the min and max size to k to only obtain the correct sizes?

I wasn't being precise because I forgot to add that we should actually be looking for maximal cliques of at least size k. The point of the clique percolation problem is that you will eventually merge all those k-cliques into larger cliques in order to identify overlapping communities. So, if you have a 24-clique in your graph, and you are examining k-clique percolation with k=4, you will end up enumerating and listing all the 4-subcliques of that 24-clique only to merge them later. It is much more efficient to look for maximal cliques of size at least k because then you effectively do the merging "in advance". Sorry if this wasn't clear; I forgot to mention the "maximal" part in my comment.

szhorvat · 2021-12-16T22:37:32Z

Addendum to my long comment above: in case it wasn't obvious, I'm a Tim Toady guy, but I realize that Python is not that kind of language :-)

jamesjiang52 and others added 3 commits December 15, 2021 15:06

Add function to find k-cliques

f73dfbd

Add k-communities function and changed example to load Zachary's Kara…

e57f362

…te Club graph

Added visualization for communities

bfcec7a

ntamas requested changes Dec 15, 2021

View reviewed changes

vtraag requested a review from ntamas December 15, 2021 22:08

ntamas requested changes Dec 15, 2021

View reviewed changes

Cleanups to code

5b02396

jamesjiang52 requested a review from ntamas December 15, 2021 23:20

ntamas reviewed Dec 16, 2021

View reviewed changes

vtraag reviewed Dec 16, 2021

View reviewed changes

Clean up code readability

3b5f0dc

jamesjiang52 requested review from ntamas and vtraag December 16, 2021 15:50

vtraag reviewed Dec 16, 2021

View reviewed changes

Cleanup comments and function names

ba280c0

jamesjiang52 requested a review from vtraag December 16, 2021 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add solutions for clique percolation #2

Add solutions for clique percolation #2

jamesjiang52 commented Dec 15, 2021

dkhl65 commented Dec 15, 2021

ntamas Dec 15, 2021

ntamas Dec 15, 2021

ntamas Dec 15, 2021

ntamas commented Dec 15, 2021

ntamas left a comment

ntamas Dec 16, 2021

ntamas commented Dec 16, 2021

vtraag commented Dec 16, 2021

vtraag commented Dec 16, 2021

vtraag Dec 16, 2021

vtraag left a comment

vtraag Dec 16, 2021

vtraag Dec 16, 2021

vtraag Dec 16, 2021

vtraag Dec 16, 2021

vtraag Dec 16, 2021

szhorvat commented Dec 16, 2021 •

edited

Loading

ntamas commented Dec 16, 2021

szhorvat commented Dec 16, 2021

	clique_graph.add_edges(cg_edges)
	clique_graph = ig.Graph(cg_edges)

	raise ValueError("Error: min_common_vertices=%d must be greater than k=%d" % (min_common_vertices, k))
	raise ValueError(f"Error: min_common_vertices={min_common_vertices} must be greater than k={k}")


		EXAMPLE_K = 3

		def k_communities(g, k, min_common_vertices):

Add solutions for clique percolation #2

Are you sure you want to change the base?

Add solutions for clique percolation #2

Conversation

jamesjiang52 commented Dec 15, 2021

dkhl65 commented Dec 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntamas commented Dec 15, 2021

ntamas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntamas commented Dec 16, 2021

vtraag commented Dec 16, 2021

vtraag commented Dec 16, 2021

Choose a reason for hiding this comment

vtraag left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szhorvat commented Dec 16, 2021 • edited Loading

ntamas commented Dec 16, 2021

szhorvat commented Dec 16, 2021

szhorvat commented Dec 16, 2021 •

edited

Loading