-
Notifications
You must be signed in to change notification settings - Fork 0
/
exercise-sheet-4.Rmd
195 lines (117 loc) · 4.96 KB
/
exercise-sheet-4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
title: "Exercise sheet 4: Pair-HMM"
---
# Exercise 1
You are given the basic pair-HMM for sequence alignment between two sequences:
```{r, echo=FALSE, out.width="50%", fig.align='center'}
knitr::include_graphics("figures/sheet-4/HMM.png")
```
Let $\delta = 0.02$ and $\epsilon=0.79$. The initial probability distribution of the states is given by
$\pi(M)=0.6$, $\pi(I_x) = 0.2$ and $\pi(I_y) = 0.2$. Furthermore, let all $p(x_i,y_j)$ and $q(x_i)$ (and $q(y_j)$)
be given in matrix $p$ and vector $q$, respectively:
```{r, echo=FALSE, out.width="70%", fig.align='center'}
knitr::include_graphics("figures/sheet-4/matrix_exercise01.png")
```
Identify the probabilities of the following alignments between sequences x=AGCGG and y=ACAGGGG.
### 1a)
```
x: AGCGG----
:::
y: --ACAGGGG
```
#### {.tabset}
##### Hide
##### Formulae
::: {.answer data-latex=""}
$Prob(path=I_xI_xMMMI_yI_yI_yI_y)=\pi(I_x) \cdot \epsilon \cdot (1-\epsilon) \cdot (1-2\delta)^2 \cdot \delta \cdot \epsilon^3$
$Prob(O | path)=q(A) \cdot q(G) \cdot p(C,A) \cdot p(G,C) \cdot p(G,A) \cdot q(G)^4$
$Prob(path, O) = Prob(O | path ) \times Prob(path)$
:::
##### Solution
::: {.answer data-latex=""}
$Prob(path=I_xI_xMMMI_yI_yI_yI_y) = 0.2 \cdot 0.79 \cdot 0.21 \cdot 0.96^2 \cdot 0.02 \cdot 0.79^3 = 0.000301452... \approx 3.0 \cdot 10^{-4}$
$Prob(O | path) = 0.3 \cdot 0.2 \cdot \frac{3}{80} \cdot \frac{3}{40} \cdot \frac{1}{80} \cdot 0.2^4 = 3.375 \cdot 10^{-9} \approx 3.4 \cdot 10^{-9}$
$Prob(path,O) \approx 1.0 \cdot 10^{-12}$
:::
#### {-}
### 1b)
```
x: -AGCGG-
:::||
y: ACAGGGG
```
#### {.tabset}
##### Hide
##### Formulae
::: {.answer data-latex=""}
$Prob(path=I_yMMMMMI_y)=\pi(I_y) \cdot (1-\epsilon) \cdot (1-2\delta)^4 \cdot \delta$
$Prob(O | path)=q(A) \cdot p(A,C) \cdot p(G,A) \cdot p(C,G) \cdot p(G,G) \cdot p(G,G) \cdot q(G)$
$Prob(path, O) = Prob(O | path ) \times Prob(path)$
:::
##### Solution
::: {.answer data-latex=""}
$Prob(path=I_yMMMMMI_y) = 0.2 \cdot 0.21 \cdot 0.96^4 \cdot 0.02 = 0.0007134... \approx 7.1 \cdot 10^{-4}$
$Prob(O | path) = 0.3 \cdot \frac{3}{80} \cdot \frac{1}{80} \cdot \frac{3}{40} \cdot \frac{1}{8}^2 \cdot 0.2 \approx 3.3 \cdot 10^{-8}$
$Prob(path,O) \approx 2.4 \cdot 10^{-11}$
:::
#### {-}
### 1c)
```
x: AGCGG------
:
y: ----ACAGGGG
```
#### {.tabset}
##### Hide
##### Formulae
::: {.answer data-latex=""}
$Prob(path=I_xI_xI_xI_xMI_yI_yI_yI_yI_yI_y)=\pi(I_x) \cdot \epsilon^3 \cdot (1-\epsilon) \cdot \delta \cdot \epsilon^5$
$Prob(O | path)=q(A) \cdot q(G) \cdot q(C) \cdot q(G) \cdot p(G,A) \cdot q(C) \cdot q(A) \cdot q(G) \cdot q(G) \cdot q(G) \cdot q(G)$
$Prob(path, O) = Prob(O | path ) \times Prob(path)$
:::
##### Solution
::: {.answer data-latex=""}
$Prob(path=I_xI_xI_xI_xMI_yI_yI_yI_yI_yI_y) = 0.2 \cdot 0.79^3 \cdot 0.21 \cdot 0.02 \cdot 0.79^5 = 0.0000127...\approx 1.3 \cdot 10^{-4}$
$Prob(O | path) = 0.3 \cdot 0.2^3 \cdot \frac{1}{80} \cdot 0.2 \cdot 0.3 \cdot 0.2^4 = 2.88 \cdot 10^{-9} \approx 2.9 \cdot 10^{-9}$
$Prob(path,O) \approx 3.7 \cdot 10^{-13}$
:::
#### {-}
-------------------------------------------
# Exercise 2
The following alignment of sequences a=AACTT and b=AACAT is not included in the set of alignments
represented by the pair-HMM of exercise 1.
```
a: AACT-T
||| |
b: AAC-AT
```
### 2a)
Could you explain why?
#### {.tabset}
##### Hide
##### Solution
Because the probability of moving from $I_x$ to $I_y$ is zero, there is no edge between $I_x$ and $I_y$.
#### {-}
-------------------------------------------
# Exercise 3
As you have seen, the given pair-HMM, which emits alignments of two sequences, gives us probabilities
which are quite small for any particular alignment. These probabilities are
often compared to other probabilites generated by a random model.
### 3a)
::: {.answer data-latex=""}
Design a HMM which generates two random sequences with the frequencies of $q_i$ given in exercise 1.
Use the parameters $\eta$ and $1-\eta$ to describe the transition probabilities.
:::
#### {.tabset}
##### Hide
##### Hint
::: {.answer data-latex=""}
The proposed solution includes two main states, which in turn emits two sequences, independently of each other. Each has a loop
back onto itself with probability (1-$\eta$). As well as Begin and End states, the proposed solution includes a silent state in between X and Y, used to gather inputs from both the X and Begin states.
:::
##### Solution
```{r, echo=FALSE, out.width="70%", fig.align='center'}
knitr::include_graphics("figures/sheet-4/exercise_03_HMM_graph.png")
```
#### {-}
-------------------------------------------