This repository provides all the documents related to my master thesis
Objective adn Study Plan
- Can PIP adapt to long indels or not ?
- Comparative study using ProPIP, PRANK and MAFFT
- Dataset: Using long indel (INDELible, Real data) and single indel data (PIP)
- Study using supplementary features of ProPIP.
ProPIP
- Built under an explicit indel process, PIP (Poisson Indel Process)
- Basic PIP equations:
Asymptotic expected sequence length, E = λ/μ Indel Intensity, I = λ . μ
Progressive dynamic programming approach.
Marginal likelihood calculation at each internal node of the guide tree Alignment selection using Maximum Likelihood
ProPIP requirements: input sequences, trees, rate matrix Q, λ and μ
SBDP and STFT
Stochastic Backtracking DP algorithm (SBDP)
i. Ensemble of sub-optimal solutions at each internal node
ii. Randomise the sub-optimal MSAs selected using SB algorithm
iii.The distortion parameter, T
Short-Time Fourier Transform (STFT)
i. Detection of homologous regions using STFT
ii. For reducing computational complexity
iii.Window functions and its size
Datasets
MSA Evaluation
Results and Discussions
Conclusions and Future Works
THE CONCLUSION
- MAFFT and PRANK outperformed ProPIP in INDELible and real data (long indel data).
- ProPIP best fits the PIP data and performs better than other traditional aligners, MAFFT and PRANK.
- The ProPIP performance under long indel data can be improved using its additional features
- Parameter α is not suitable for INDELible data however combined with k can improve the alignment quality. When k= 0.05 and α= 0.05 we witnessed improvement. For relatively lower α the nIndels increased.
- For relatively lower k — nIndels decreased Q score Increased for all data types same pattern.
- But for PIP, same pattern as above but fit obtained at k= 2
- SBDP- for relatively lower T the value of nIndels Increased — max indel length decreased
- STFT
THE FUTURE
- We observed the possibility of tuning PIP model in order to adapt long indel data
- K depends on the prior knowledge on MSA length
- K smaller or larger
- K independent or together with α. The improved performance of k and Gamma still depends on k ?
- SBDP and STFT need more tests to verify their poor performances.
Visualisation Tools: SuiteMSA
SuiteMSA
SuiteMSA: Visual tools for multiple sequence alignment comparison and molecular sequence simulation