PyStructProtCluster is a Python tool for structure based protein alignment and clustering. It can process protein three-dimensional structure files in PDB format and generate dendrogram using UPGMA method to display the structural relationships between proteins. We used the Usalign open-source tool to generate pairwise similarity values for proteins, and finally generated a. nwk format dendrogram file through the similarity matrix, including branch names and distances between branches, which helps to understand protein function and reveal evolutionary relationships.
- PDB File Processing: Analyzes protein structures in PDB format.
- Dendrogram Generation: Creates visual representations of structural relationships.
- Parallel Computation: Efficiently computes pairwise alignments in parallel.
- Customizable Parallelism: Adjust the number of parallel processes as needed.
- Evolutionary and Functional Insights: Supports studies in protein evolution and function prediction.
- Python (compatible versions)
- USalign: An open-source tool for protein structure alignment. GitHub Repository
pdbs/
: Directory for storing the PDB files of proteins to be analyzed.list_pdb{i}{n}.txt
: Intermediate files generated during the pairwise comparison process, auto-generated and can be deleted before use.test/af_output/
: Directory containing the similarity matrices and related output/error files.
- Clone the repository and ensure all dependencies are installed:
git clone https://github.com/mrqfliu/PyStructProtCluster.git
cd PyStructProtCluster
- Clone the US-align Repository:
git clone https://github.com/pylelab/USalign
- Navigate to the Repository Directory and Compile:
cd USalign
make
- Prepare Your PDB Files: Place PDB files into the
pdbs/
directory. - Configure the Environment Path in
get_align.sh
. - Run the complete workflow:
bash get_align.sh
This command will execute the full process from PDB files to the generation of the tree.nwk
file.
Alternatively, you can run the steps individually for debugging or detailed analysis:
- Generate pairwise comparison sublists:
PdbFileListGenerator.py
- Compute similarity matrix in parallel:
get_align2.sh
(Editparallel_count
to set parallel processes) - Stack alignment outputs:
get_align.py
- Generate similarity matrix and save to CSV:
similarity_matrix.py
- Output dendrogram file:
generate_tree.py
PdbFileListGenerator.py
: Generates sublists for pairwise structure comparisons.get_align2.sh
: Executes pairwise alignment in parallel.get_align.py
: Combines alignment results intocombined_similarity.txt
.similarity_matrix.py
: Creates a similarity matrix from alignment results.generate_tree.py
: Produces atree.nwk
file for dendrogram visualization.
Visualize the tree.nwk
file using the iTOL web tool for an interactive dendrogram. iTOL Website
PyStructProtCluster is open-source software released under the MIT License. The full text of the license can be found in the LICENSE file in the repository.
This software is provided "as is", without any warranty of any kind, express or implied. No guarantees are made regarding its fitness for any particular purpose or non-infringement of any intellectual property. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability arising from, out of, or in connection with the software or its use.
Contributions to the PyStructProtCluster project are welcome! I am open to feedback, bug reports, and enhancements that can help improve the tool for the community. Here's how you can contribute:
- Bug Reports: If you find any issues or bugs, please submit an issue with detailed information.
- Feature Requests: If you have ideas for new features, feel free to open a new issue to discuss your ideas.
- Pull Requests: If you have developed a fix or a new feature, submit a pull request. Make sure to follow the project's coding standards and include tests when applicable.
For support, please contact the maintainer at [email protected].