Skip to content

A collection of Python scripts that implement various graph clustering algorithms, specifically for identifying protein complexes from protein-protein interaction networks.

License

Notifications You must be signed in to change notification settings

zhiweiyi11/python-graph-clustering

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph Clustering in Python

This is a collection of Python scripts that implement various weighted and unweighted graph clustering algorithms. The project is specifically geared towards discovering protein complexes in protein-protein interaction networks, although the code can really be applied to any graph. The methods implemented here include weighted and unweighted versions of the following graph clustering algorithms:

  • Clique Percolation (k=3 and k=4)
  • MCODE
  • DPClus
  • IPCA
  • CoAch
  • Graph Entropy Clustering

The code is free for academic use. If you find this project useful, please consider citing

True Price, Francisco I Peña III, Young-Rae Cho. "Survey: Enhancing protein complex prediction in PPI networks with GO similarity weighting." Interdisciplinary Sciences: Computational Life Sciences, 2013. [pdf]

Requirements

NumPy and SciPy. networkx is also required for the clique percolation methods.

Usage

All files have been tested to run in Python 2.7. Simply run

python <script.py> <graph file>

where <graph file> contains a separate entry on each line defining an edge between two nodes in the graph, i.e.:

<node 1> <node 2> [edge weight]

The edge weight entry is optional for the unweighted methods. Example datasets for PPI networks (weighted with similarity metrics on Gene Ontology entries) are provided in the "data" folder.

The output of the scripts is a collection of discovered graph clusters, one per line. Some methods also print progress to stderr.

Finally, it should be stated that these algorithms run with no warranty, and not all implementations are guaranteed to scale well for very large datasets (case in point, the clique percolation implementations might take a bit of time to run).

About

A collection of Python scripts that implement various graph clustering algorithms, specifically for identifying protein complexes from protein-protein interaction networks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%