This is a python program, it consist of a single a .py file. In order to run all of the program functionalities properly, at least one .pdb file is needed of a protein sequence, these files are available from protein data bank (link:https://www.rcsb.org/) for one functionality a protein database file in text format is needed as well these can be downloaded from the National Center for Biotechnology Information website available at: https://ftp.cbi.nlm.nih.gov/blast/db/
The necessary files to run the program should be placed same directory as the python program is run in, this include .pdb file, the database for local blastp.
Initially the program will ask the user to input the .pdb file name and the sequence id. The sequence id corresponds to the protein name, this is usually the file name without the .pdb unless the file name has been modified. This information is needed to run the program properly.
The program consists of four distinct functionalities and the ability to change the protein file being analysed.
The first function is used to produce a text file with important information regarding the protein being analysed. The second is a simple atom – atom distance calculator. The third enable to write a pymol script to help in protein visualisation, lastly the fourth is consist of a local BLASTP search. The user also has the ability to exit the program by typing 0. After the user chooses which file to analyse and inputs the protein name or id, some the program will print in the command line if any chains are discontinuous and where. The the user will be asked which task they would like to complete.
The first task has been ideated in case the user struggles to read the pdb file to obtain important information. This can be helpful both if someone is new to pdb files and protein analysis and it can could also be useful for people that suffer from dyslexia or have difficulties reading a pdb file as the information is usually quite crammed. The format of the output text file should be easier to read as the information reported is spaced more compared to the pdb file. Not everything in the pdb file will be reported in this task 1 text output. What will be produced in the text file is the protein sequence in 1 letter code and the total amino acid present count in the protein sequence. Then the text file will report for each amino acid which atom is present with its Atom number. The atom will be reported in order, the first Carbon will be the alpha carbon followed by the beta carbon and so on. This type of formatting reduces the amount of information present when compared to the .pdb file but also makes it easier to read and understand. The atom number is needed for the second task so if the user does not have access to the atom numbers completing the first task is recommended.
To complete the second task the user needs to know the atoms id number that the user wants to calculate the distance of. This task just as task one does use biopython modules. The program checks if the user input is a positive digit the user can also choose the number of significant digits displayed.
The third option for the program produces a text file that can be used as a pymol script. Pymol scripts can be used in pymol with the command @filepath/filename . The program will also use command shell module to output the path to the file being created to facilitate the script useage. Technically for the third task the name of the protein being analysed is irrelevant as it simply produces a simple pymol txt script.
The user will be able to choose the colour of secondary structures as well as 1 among 3 different ways to visualise the molecule, either as a putty cartoon, as a balls and sticks or as a normal cartoon.
The last program functionality consists in a blastp search. This is achieved by searching against a local database an example database is provided however more pertinent sequences can be downloaded from NCBI website. These databases are usually downloaded as zipped files, for the program to run properly however the file needs to be exported (like with the command gzip) before running this portion of the program. For this last task BLAST software also need to have been downloaded, BLAST software can be downloaded several ways one of which is through the NCBI website. The program will not download BLAST software hence to run all application local BLAST needs to already have been downloaded.