Skip to content

EldhosePoulose/Progressive-Poisson-Indel-Process-Analysis

Repository files navigation

A Study of Dynamics of Indels using ProPIP, MAFFT and PRANK

This repository provides all the documents related to my master thesis

Objective adn Study Plan

  1. Can PIP adapt to long indels or not ?
  2. Comparative study using ProPIP, PRANK and MAFFT
  3. Dataset: Using long indel (INDELible, Real data) and single indel data (PIP)
  4. Study using supplementary features of ProPIP.

ProPIP

  1. Built under an explicit indel process, PIP (Poisson Indel Process)
  2. Basic PIP equations:
    Asymptotic expected sequence length, E = λ/μ Indel Intensity, I = λ . μ
    Progressive dynamic programming approach.
    Marginal likelihood calculation at each internal node of the guide tree Alignment selection using Maximum Likelihood
    ProPIP requirements: input sequences, trees, rate matrix Q, λ and μ

SBDP and STFT Stochastic Backtracking DP algorithm (SBDP)
i. Ensemble of sub-optimal solutions at each internal node
ii. Randomise the sub-optimal MSAs selected using SB algorithm
iii.The distortion parameter, T

Short-Time Fourier Transform (STFT)
i. Detection of homologous regions using STFT
ii. For reducing computational complexity
iii.Window functions and its size

Datasets

MSA Evaluation

Results and Discussions

Conclusions and Future Works

THE CONCLUSION

  1. MAFFT and PRANK outperformed ProPIP in INDELible and real data (long indel data).
  2. ProPIP best fits the PIP data and performs better than other traditional aligners, MAFFT and PRANK.
  3. The ProPIP performance under long indel data can be improved using its additional features
  4. Parameter α is not suitable for INDELible data however combined with k can improve the alignment quality. When k= 0.05 and α= 0.05 we witnessed improvement. For relatively lower α the nIndels increased.
  5. For relatively lower k — nIndels decreased Q score Increased for all data types same pattern.
  6. But for PIP, same pattern as above but fit obtained at k= 2
  7. SBDP- for relatively lower T the value of nIndels Increased — max indel length decreased
  8. STFT

THE FUTURE

  1. We observed the possibility of tuning PIP model in order to adapt long indel data
  2. K depends on the prior knowledge on MSA length
  3. K smaller or larger
  4. K independent or together with α. The improved performance of k and Gamma still depends on k ?
  5. SBDP and STFT need more tests to verify their poor performances.

Visualisation Tools: SuiteMSA

SuiteMSA
SuiteMSA: Visual tools for multiple sequence alignment comparison and molecular sequence simulation

About

This repository provides all the documents related to my master thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published