Skip to content

Commit

Permalink
fixed timetable and order
Browse files Browse the repository at this point in the history
  • Loading branch information
domonik committed Apr 23, 2024
1 parent a4469f3 commit fec3637
Show file tree
Hide file tree
Showing 3 changed files with 184 additions and 187 deletions.
23 changes: 10 additions & 13 deletions assets/timetable.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
{
"exercise-sheet-1": "2024-04-17T09:00:00",
"exercise-sheet-2": "2024-12-30T09:00:00",
"exercise-sheet-2": "2024-12-30T09:00:00",
"exercise-sheet-3": "2024-12-30T09:00:00",
"exercise-sheet-4": "2024-12-30T09:00:00",
"exercise-sheet-5": "2024-12-30T09:00:00",
"exercise-sheet-6": "2024-12-30T09:00:00",
"exercise-sheet-7": "2024-12-30T09:00:00",
"exercise-sheet-8": "2024-12-30T09:00:00",
"exercise-sheet-9": "2024-12-30T09:00:00",
"exercise-sheet-10": "2024-12-30T09:00:00",
"exercise-sheet-11": "2024-12-30T09:00:00",
"exercise-sheet-12": "2024-12-30T09:00:00"

"exercise-sheet-2": "2024-04-30T09:00:00",
"exercise-sheet-3": "2024-05-07T09:00:00",
"exercise-sheet-4": "2024-05-14T09:00:00",
"exercise-sheet-5": "2024-05-28T09:00:00",
"exercise-sheet-6": "2024-06-04T09:00:00",
"exercise-sheet-7": "2024-06-11T09:00:00",
"exercise-sheet-8": "2024-06-18T09:00:00",
"exercise-sheet-9": "2024-06-25T09:00:00",
"exercise-sheet-10": "2024-07-02T09:00:00",
"exercise-sheet-11": "2024-07-09T09:00:00"
}
174 changes: 59 additions & 115 deletions exercise-sheet-8.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,27 @@ library(officer)
```

---
title: "Exercise sheet 8: Suffix-Trees"
title: "Exercise sheet 9: Data Driven Life Sciences"
---

---------------------------------

# Exercise 1

You are given the text T=`CAGTAGTAGC`.

### 1a)
::: {.question data-latex=""}
Arrange the following terms into their correct order in the Illumina sequencing method and describe each of them briefly:

- bridge amplification

### 1a)
- deblocking

::: {.question data-latex=""}
- library preparation

- annealing of template strands to flow cell

Draw the corresponding suffix tree!
- fluorescence detection
:::

#### {.tabset}
Expand All @@ -31,125 +36,79 @@ Draw the corresponding suffix tree!
##### Solution
::: {.answer data-latex=""}

```{r, echo=FALSE, out.width="100%", fig.align='center'}
knitr::include_graphics("figures/sheet-8/suffix_tree_1.png")
```
:::
**1. Library preparation:**

#### {-}
A sequencing *library* gets *prepared* from a sample by fragmenting the original DNA and adding Illumina-specific adapter sequences to both ends of the fragments. The *library* is what gets read during sequencing.

**2. Template strand annealing**

### 1b)
::: {.question data-latex=""}
The single-stranded library fragments are used as *template strands* in the sequencing and are *annealed* to primer sequences, which are bound to the *flow cell* and are complementary to the adapter sequences of the fragments.

Describe the steps of a counting query for $P =$ `TAG`.
:::
**3. Bridge amplification**

#### {.tabset}
After complementary strands have been synthesized and the templates been washed off, the now flow cell-bound fragments are *amplified* in several cycles of so-called *bridge-amplification* to form fragment colonies, or *clusters* on the flow cell to guarantee a detectable fluorescence signal during sequencing.

##### Hide
**4. Fluorescence detection**

##### Solution
::: {.answer data-latex=""}
Illumina-sequencing is a form of *sequencing-by-synthesis* in which the nucleotides incorporated into the growing strand are detected via attached *fluorophores*. After the first $3$ steps, the following steps are iterated to sequence the entire read:

* start at root node
* locate outgoing edge that starts with $T$
* match subsequent characters of the pattern
* in the subtree rooted at TAG count the number of leaves $\Rightarrow 2$
:::
#### {-}
Modified nucleotides, containing a fluorescent group, are used to extend the strand, their blocking groups are cleaved from their 3`-OH groups.

**5. Deblocking**

*Deblocking* is the removal of the fluorophore (blocking group). It is necessary before a new round of elongation by one nucleotide can begin.

### 1c)
::: {.question data-latex=""}

Describe the steps of a reporting query for $P =$ `AG`.
:::

#### {.tabset}

##### Hide

##### Solution
::: {.answer data-latex=""}

* start at root node
* locate outgoing edge that start with $A$
* match subsequent characters of the pattern
* in the subtree rooted at AG report the labels of all leaves $\Rightarrow \{2, 5, 8\}$
More information about this topic can be found on the [Illumina Webpage](https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html).
:::
#### {-}

# Exercise 2

```{r, echo=FALSE, out.width="75%", fig.align='center'}
knitr::include_graphics("figures/sheet-9/crossword.png")
```

### 2a)
::: {.question data-latex=""}

Draw a generalized suffix tree for the sequences $A=$`CCATG` and $B=$ `CATG`.
:::
**Solve the crossword puzzle!**

#### {.tabset}
Horizontal:

##### Hide
- 3. Added to DNA fragments during library preparation.

##### Hint 1
::: {.answer data-latex=""}
- 8. Illumina way of determining the order of nucleotides in a DNA strand. (3 words)

Concatenate the two sequences using a unique character for splitting. e.g.
`CCATG#CATG$`.
- 9. ChIP-Seq can be used for sequencing DNA regions that are bound by these.

Dont forget to include suffix links!
:::
##### Formulae
::: {.answer data-latex=""}
- 11. The alphabet of life.

$sl(v) = w$
- 12. Formed by bridge-amplification on Illumina flow-cells.

$\overline{v} = cb$
- 13. Flowcell surface filled with these 2 different DNA molecules.

$\overline{w} = b$
- 15. Measure to asses the quality of the identification of nucleobases generated by automated DNA sequencing. (3 words)

$c: character, b: string$

Vertical:

remember: $\overline{v}$ denotes the concatenation of all path labels from the root to v.
:::
##### Solution
::: {.answer data-latex=""}
- 1. Dideoxynucleosidetriphosphates (abbrev.)

```{r, echo=FALSE, out.width="100%", fig.align='center'}
knitr::include_graphics("figures/sheet-8/suffix_tree_2.png")
```
:::
#### {-}
- 2. Process of determining positions of reads on the reference genome.

### 2b)
::: {.question data-latex=""}
- 4. Gene expression can be measured using this. (abbrev. hyph.)

Find the Maximal Unique Matches of the sequences $A=$`CCATG` and $B=$`CATG` using
the tree from A).
:::

#### {.tabset}
- 5. The process of making many copies of a piece of DNA.

##### Hide
- 6. Found in pairs in DNA.

##### Solution
::: {.answer data-latex=""}
- 7. Chemical group attached to nucleotides to monitor incorporation into DNA.

`CATG` is the only MUM as $\overline{v} =$ `CATG` has no suffix links pointing to
it
:::
#### {-}
- 10. File format used to store sequence information.

- 14. Breakthrough sequencing method (abbrev.)

# Exercise 3

### 3a)
::: {.question data-latex=""}

Draw a generalized suffix tree for the sequence $A=$`ACGCACGCG`.
:::

#### {.tabset}
Expand All @@ -158,55 +117,40 @@ Draw a generalized suffix tree for the sequence $A=$`ACGCACGCG`.

##### Solution
::: {.answer data-latex=""}

```{r, echo=FALSE, out.width="100%", fig.align='center'}
knitr::include_graphics("figures/sheet-8/suffix_tree_3.png")
```{r, echo=FALSE, out.width="75%", fig.align='center'}
knitr::include_graphics("figures/sheet-9/crossword_solved.png")
```
:::

#### {-}

# Exercise 3

#### {.tabset}

### 3b)
### 3a)
::: {.question data-latex=""}
You want to determine how many reads $N$ are needed to achieve a coverage depth $C$ of 20X when sequencing reads for *Escherichia coli*.

Find all maximal pairs of length at least 2.
The length of the reads $L$ is 30nt and the *E. coli* genome $G$ is approximately 4.6 million bases long.
:::

#### {.tabset}

##### Hide

##### Solution
##### Formula
::: {.answer data-latex=""}

`ACGC`: $(1,5,4)$

`CG`: $(2,8,2), (6,8,2)$
$$
N = \frac{C\times G}{L}
$$
:::
#### {-}


### 3c)
::: {.question data-latex=""}

Why is `C`: $(2, 8, 1)$ not a maximal pair?

:::

#### {.tabset}

##### Hide

##### Solution
::: {.answer data-latex=""}

It is not right maximal.
This can be seen since `CG`: $(2, 8, 2)$ already includes the indices 2 and 8 with
a longer match.

$$
N = \frac{20\times 4600000}{30} \approx 3066667 \text{ reads}
$$
:::
#### {-}



Expand Down
Loading

0 comments on commit fec3637

Please sign in to comment.