Skip to content

This tool can process PDB format protein 3D structure files and generate dendrograms to show the structural relationships between proteins.

License

Notifications You must be signed in to change notification settings

mrqfliu/PyStructProtCluster

Repository files navigation

PyStructProtCluster



Overview

 PyStructProtCluster is a Python tool for structure based protein alignment and clustering. It can process protein three-dimensional structure files in PDB format and generate dendrogram using UPGMA method to display the structural relationships between proteins. We used the Usalign open-source tool to generate pairwise similarity values for proteins, and finally generated a. nwk format dendrogram file through the similarity matrix, including branch names and distances between branches, which helps to understand protein function and reveal evolutionary relationships. 

Key Features



  • PDB File Processing: Analyzes protein structures in PDB format.
  • Dendrogram Generation: Creates visual representations of structural relationships.
  • Parallel Computation: Efficiently computes pairwise alignments in parallel.
  • Customizable Parallelism: Adjust the number of parallel processes as needed.
  • Evolutionary and Functional Insights: Supports studies in protein evolution and function prediction. 

Prerequisites



  • Python (compatible versions)
  • USalign: An open-source tool for protein structure alignment. GitHub Repository 

Directory Structure



  • pdbs/: Directory for storing the PDB files of proteins to be analyzed.
  • list_pdb{i}{n}.txt: Intermediate files generated during the pairwise comparison process, auto-generated and can be deleted before use.
  • test/af_output/: Directory containing the similarity matrices and related output/error files. 

Installation



  1. Clone the repository and ensure all dependencies are installed:
git clone https://github.com/mrqfliu/PyStructProtCluster.git
cd PyStructProtCluster
  1. Clone the US-align Repository:
git clone https://github.com/pylelab/USalign
  1. Navigate to the Repository Directory and Compile:
cd USalign
make

Usage



  • Prepare Your PDB Files: Place PDB files into the pdbs/ directory.
  • Configure the Environment Path in get_align.sh.
  • Run the complete workflow:
bash get_align.sh

This command will execute the full process from PDB files to the generation of the tree.nwk file.  Alternatively, you can run the steps individually for debugging or detailed analysis:

  1. Generate pairwise comparison sublists: PdbFileListGenerator.py
  2. Compute similarity matrix in parallel: get_align2.sh (Edit parallel_count to set parallel processes)
  3. Stack alignment outputs: get_align.py
  4. Generate similarity matrix and save to CSV: similarity_matrix.py
  5. Output dendrogram file: generate_tree.py 

Scripts



  • PdbFileListGenerator.py: Generates sublists for pairwise structure comparisons.
  • get_align2.sh: Executes pairwise alignment in parallel.
  • get_align.py: Combines alignment results into combined_similarity.txt.
  • similarity_matrix.py: Creates a similarity matrix from alignment results.
  • generate_tree.py: Produces a tree.nwk file for dendrogram visualization.

Visualization

 Visualize the tree.nwk file using the iTOL web tool for an interactive dendrogram. iTOL Website

License

PyStructProtCluster is open-source software released under the MIT License. The full text of the license can be found in the LICENSE file in the repository.

Disclaimer

This software is provided "as is", without any warranty of any kind, express or implied. No guarantees are made regarding its fitness for any particular purpose or non-infringement of any intellectual property. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability arising from, out of, or in connection with the software or its use. 

Contributing

Contributions to the PyStructProtCluster project are welcome! I am open to feedback, bug reports, and enhancements that can help improve the tool for the community. Here's how you can contribute:

  • Bug Reports: If you find any issues or bugs, please submit an issue with detailed information.
  • Feature Requests: If you have ideas for new features, feel free to open a new issue to discuss your ideas.
  • Pull Requests: If you have developed a fix or a new feature, submit a pull request. Make sure to follow the project's coding standards and include tests when applicable.

Support

For support, please contact the maintainer at [email protected].

About

This tool can process PDB format protein 3D structure files and generate dendrograms to show the structural relationships between proteins.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published