From fec36371a283b9ae0f3c4988e9554b94965ebcf6 Mon Sep 17 00:00:00 2001
From: domonik <schdruzzi@gmail.com>
Date: Tue, 23 Apr 2024 11:52:27 +0200
Subject: [PATCH] fixed timetable and order

---
 assets/timetable.json |  23 +++---
 exercise-sheet-8.Rmd  | 174 ++++++++++++++----------------------------
 exercise-sheet-9.Rmd  | 174 ++++++++++++++++++++++++++++--------------
 3 files changed, 184 insertions(+), 187 deletions(-)

diff --git a/assets/timetable.json b/assets/timetable.json
index bf183e1..8c9e1c5 100644
--- a/assets/timetable.json
+++ b/assets/timetable.json
@@ -1,16 +1,13 @@
 {
     "exercise-sheet-1": "2024-04-17T09:00:00",
-    "exercise-sheet-2": "2024-12-30T09:00:00",
-    "exercise-sheet-2": "2024-12-30T09:00:00",
-    "exercise-sheet-3": "2024-12-30T09:00:00",
-    "exercise-sheet-4": "2024-12-30T09:00:00",
-    "exercise-sheet-5": "2024-12-30T09:00:00",
-    "exercise-sheet-6": "2024-12-30T09:00:00",
-    "exercise-sheet-7": "2024-12-30T09:00:00",
-    "exercise-sheet-8": "2024-12-30T09:00:00",
-    "exercise-sheet-9": "2024-12-30T09:00:00",
-    "exercise-sheet-10": "2024-12-30T09:00:00",
-    "exercise-sheet-11": "2024-12-30T09:00:00",
-    "exercise-sheet-12": "2024-12-30T09:00:00"
-
+    "exercise-sheet-2": "2024-04-30T09:00:00",
+    "exercise-sheet-3": "2024-05-07T09:00:00",
+    "exercise-sheet-4": "2024-05-14T09:00:00",
+    "exercise-sheet-5": "2024-05-28T09:00:00",
+    "exercise-sheet-6": "2024-06-04T09:00:00",
+    "exercise-sheet-7": "2024-06-11T09:00:00",
+    "exercise-sheet-8": "2024-06-18T09:00:00",
+    "exercise-sheet-9": "2024-06-25T09:00:00",
+    "exercise-sheet-10": "2024-07-02T09:00:00",
+    "exercise-sheet-11": "2024-07-09T09:00:00"
 }
diff --git a/exercise-sheet-8.Rmd b/exercise-sheet-8.Rmd
index 030e589..47c19f3 100644
--- a/exercise-sheet-8.Rmd
+++ b/exercise-sheet-8.Rmd
@@ -6,22 +6,27 @@ library(officer)
 ```
 
 ---
-title: "Exercise sheet 8: Suffix-Trees"
+title: "Exercise sheet 9: Data Driven Life Sciences"
 ---
 
 ---------------------------------
 
 # Exercise 1
 
-You are given the text T=`CAGTAGTAGC`.
 
+### 1a)
+::: {.question data-latex=""}
+Arrange the following terms into their correct order in the Illumina sequencing method and describe each of them briefly:
 
+- bridge amplification
 
-### 1a)
+- deblocking
 
-::: {.question data-latex=""}
+- library preparation
+
+- annealing of template strands to flow cell
 
-Draw the corresponding suffix tree!
+- fluorescence detection
 ::: 
 
 #### {.tabset}
@@ -31,125 +36,79 @@ Draw the corresponding suffix tree!
 ##### Solution
 ::: {.answer data-latex=""}
 
-```{r, echo=FALSE, out.width="100%", fig.align='center'}
-knitr::include_graphics("figures/sheet-8/suffix_tree_1.png")
-```
-::: 
+**1. Library preparation:**
 
-#### {-}
+A sequencing *library* gets *prepared* from a sample by fragmenting the original DNA and adding Illumina-specific adapter sequences to both ends of the fragments. The *library* is what gets read during sequencing.
 
+**2. Template strand annealing**
 
-### 1b)
-::: {.question data-latex=""}
+The single-stranded library fragments are used as *template strands* in the sequencing and are *annealed* to primer sequences, which are bound to the *flow cell* and are complementary to the adapter sequences of the fragments.
 
-Describe the steps of a counting query for $P =$ `TAG`.
-::: 
+**3. Bridge amplification**
 
-#### {.tabset}
+After complementary strands have been synthesized and the templates been washed off, the now flow cell-bound fragments are *amplified* in several cycles of so-called *bridge-amplification* to form fragment colonies, or *clusters* on the flow cell to guarantee a detectable fluorescence signal during sequencing. 
 
-##### Hide
+**4. Fluorescence detection**
 
-##### Solution
-::: {.answer data-latex=""}
+Illumina-sequencing is a form of *sequencing-by-synthesis* in which the nucleotides incorporated into the growing strand are detected via attached *fluorophores*. After the first $3$ steps, the following steps are iterated to sequence the entire read:
 
-* start at root node
-* locate outgoing edge that starts with $T$
-* match subsequent characters of the pattern
-* in the subtree rooted at TAG count the number of leaves $\Rightarrow 2$
-::: 
-#### {-}
+Modified nucleotides, containing a fluorescent group, are used to extend the strand, their blocking groups are cleaved from their 3`-OH groups.
 
+**5. Deblocking**
 
+*Deblocking* is the removal of the fluorophore (blocking group). It is necessary before a new round of elongation by one nucleotide can begin.
 
-### 1c)
-::: {.question data-latex=""}
 
-Describe the steps of a reporting query for $P =$ `AG`.
-::: 
-
-#### {.tabset}
-
-##### Hide
-
-##### Solution
-::: {.answer data-latex=""}
-
-* start at root node
-* locate outgoing edge that start with $A$
-* match subsequent characters of the pattern
-* in the subtree rooted at AG report the labels of all leaves $\Rightarrow \{2, 5, 8\}$
+More information about this topic can be found on the [Illumina Webpage](https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html).
 ::: 
 #### {-}
 
 # Exercise 2
 
+```{r, echo=FALSE, out.width="75%", fig.align='center'}
+knitr::include_graphics("figures/sheet-9/crossword.png")
+```
+
 ### 2a)
 ::: {.question data-latex=""}
 
-Draw a generalized suffix tree for the sequences $A=$`CCATG` and $B=$ `CATG`.
-::: 
+**Solve the crossword puzzle!**
 
-#### {.tabset}
+Horizontal:
 
-##### Hide
+- 3. Added to DNA fragments during library preparation.
 
-##### Hint 1 
-::: {.answer data-latex=""}
+- 8. Illumina way of determining the order of nucleotides in a DNA strand. (3 words)
 
-Concatenate the two sequences using a unique character for splitting. e.g.
-`CCATG#CATG$`.
+- 9. ChIP-Seq can be used for sequencing DNA regions that are bound by these.
 
-Dont forget to include suffix links!
-::: 
-##### Formulae
-::: {.answer data-latex=""}
+- 11. The alphabet of life.
 
-$sl(v) = w$
+- 12. Formed by bridge-amplification on Illumina flow-cells.
 
-$\overline{v} = cb$
+- 13. Flowcell surface filled with these 2 different DNA molecules.
 
-$\overline{w} = b$
+- 15. Measure to asses the quality of the identification of nucleobases generated by automated DNA sequencing. (3 words)
 
-$c: character, b: string$
 
+Vertical:
 
-remember: $\overline{v}$ denotes the concatenation of all path labels from the root to v.
-::: 
-##### Solution
-::: {.answer data-latex=""}
+- 1. Dideoxynucleosidetriphosphates (abbrev.)
 
-```{r, echo=FALSE, out.width="100%", fig.align='center'}
-knitr::include_graphics("figures/sheet-8/suffix_tree_2.png")
-```
-::: 
-#### {-}
+- 2. Process of determining positions of reads on the reference genome.
 
-### 2b)
-::: {.question data-latex=""}
+- 4. Gene expression can be measured using this. (abbrev. hyph.)
 
-Find the Maximal Unique Matches of the sequences $A=$`CCATG` and $B=$`CATG` using 
-the tree from A).
-::: 
-
-#### {.tabset}
+- 5. The process of making many copies of a piece of DNA.
 
-##### Hide
+- 6. Found in pairs in DNA.
 
-##### Solution
-::: {.answer data-latex=""}
+- 7. Chemical group attached to nucleotides to monitor incorporation into DNA.
 
-`CATG` is the only MUM as $\overline{v} =$ `CATG` has no suffix links pointing to
-it
-::: 
-#### {-}
+- 10. File format used to store sequence information.
 
+- 14. Breakthrough sequencing method (abbrev.)
 
-# Exercise 3
-
-### 3a)
-::: {.question data-latex=""}
-
-Draw a generalized suffix tree for the sequence $A=$`ACGCACGCG`.
 ::: 
 
 #### {.tabset}
@@ -158,55 +117,40 @@ Draw a generalized suffix tree for the sequence $A=$`ACGCACGCG`.
 
 ##### Solution
 ::: {.answer data-latex=""}
-
-```{r, echo=FALSE, out.width="100%", fig.align='center'}
-knitr::include_graphics("figures/sheet-8/suffix_tree_3.png")
+```{r, echo=FALSE, out.width="75%", fig.align='center'}
+knitr::include_graphics("figures/sheet-9/crossword_solved.png")
 ```
 ::: 
-
 #### {-}
 
+# Exercise 3
+
+#### {.tabset}
 
-### 3b)
+### 3a)
 ::: {.question data-latex=""}
+You want to determine how many reads $N$ are needed to achieve a coverage depth $C$ of 20X when sequencing reads for *Escherichia coli*.
 
-Find all maximal pairs of length at least 2.
+The length of the reads $L$ is 30nt and the *E. coli* genome $G$ is approximately 4.6 million bases long.
 ::: 
 
 #### {.tabset}
 
 ##### Hide
 
-##### Solution
+##### Formula
 ::: {.answer data-latex=""}
-
-`ACGC`: $(1,5,4)$
-
-`CG`: $(2,8,2), (6,8,2)$
+$$
+N = \frac{C\times G}{L}
+$$
 ::: 
-#### {-}
-
-
-### 3c)
-::: {.question data-latex=""}
-
-Why is `C`: $(2, 8, 1)$ not a maximal pair?
-
-::: 
-
-#### {.tabset}
-
-##### Hide
 
 ##### Solution
 ::: {.answer data-latex=""}
-
-It is not right maximal.
-This can be seen since `CG`: $(2, 8, 2)$ already includes the indices 2 and 8 with
-a longer match. 
-
+$$
+N = \frac{20\times 4600000}{30} \approx 3066667 \text{ reads}
+$$
 ::: 
-#### {-}
 
 
 
diff --git a/exercise-sheet-9.Rmd b/exercise-sheet-9.Rmd
index 47c19f3..030e589 100644
--- a/exercise-sheet-9.Rmd
+++ b/exercise-sheet-9.Rmd
@@ -6,27 +6,22 @@ library(officer)
 ```
 
 ---
-title: "Exercise sheet 9: Data Driven Life Sciences"
+title: "Exercise sheet 8: Suffix-Trees"
 ---
 
 ---------------------------------
 
 # Exercise 1
 
+You are given the text T=`CAGTAGTAGC`.
 
-### 1a)
-::: {.question data-latex=""}
-Arrange the following terms into their correct order in the Illumina sequencing method and describe each of them briefly:
-
-- bridge amplification
 
-- deblocking
 
-- library preparation
+### 1a)
 
-- annealing of template strands to flow cell
+::: {.question data-latex=""}
 
-- fluorescence detection
+Draw the corresponding suffix tree!
 ::: 
 
 #### {.tabset}
@@ -36,79 +31,125 @@ Arrange the following terms into their correct order in the Illumina sequencing
 ##### Solution
 ::: {.answer data-latex=""}
 
-**1. Library preparation:**
+```{r, echo=FALSE, out.width="100%", fig.align='center'}
+knitr::include_graphics("figures/sheet-8/suffix_tree_1.png")
+```
+::: 
 
-A sequencing *library* gets *prepared* from a sample by fragmenting the original DNA and adding Illumina-specific adapter sequences to both ends of the fragments. The *library* is what gets read during sequencing.
+#### {-}
 
-**2. Template strand annealing**
 
-The single-stranded library fragments are used as *template strands* in the sequencing and are *annealed* to primer sequences, which are bound to the *flow cell* and are complementary to the adapter sequences of the fragments.
+### 1b)
+::: {.question data-latex=""}
 
-**3. Bridge amplification**
+Describe the steps of a counting query for $P =$ `TAG`.
+::: 
 
-After complementary strands have been synthesized and the templates been washed off, the now flow cell-bound fragments are *amplified* in several cycles of so-called *bridge-amplification* to form fragment colonies, or *clusters* on the flow cell to guarantee a detectable fluorescence signal during sequencing. 
+#### {.tabset}
 
-**4. Fluorescence detection**
+##### Hide
 
-Illumina-sequencing is a form of *sequencing-by-synthesis* in which the nucleotides incorporated into the growing strand are detected via attached *fluorophores*. After the first $3$ steps, the following steps are iterated to sequence the entire read:
+##### Solution
+::: {.answer data-latex=""}
 
-Modified nucleotides, containing a fluorescent group, are used to extend the strand, their blocking groups are cleaved from their 3`-OH groups.
+* start at root node
+* locate outgoing edge that starts with $T$
+* match subsequent characters of the pattern
+* in the subtree rooted at TAG count the number of leaves $\Rightarrow 2$
+::: 
+#### {-}
 
-**5. Deblocking**
 
-*Deblocking* is the removal of the fluorophore (blocking group). It is necessary before a new round of elongation by one nucleotide can begin.
 
+### 1c)
+::: {.question data-latex=""}
 
-More information about this topic can be found on the [Illumina Webpage](https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html).
+Describe the steps of a reporting query for $P =$ `AG`.
+::: 
+
+#### {.tabset}
+
+##### Hide
+
+##### Solution
+::: {.answer data-latex=""}
+
+* start at root node
+* locate outgoing edge that start with $A$
+* match subsequent characters of the pattern
+* in the subtree rooted at AG report the labels of all leaves $\Rightarrow \{2, 5, 8\}$
 ::: 
 #### {-}
 
 # Exercise 2
 
-```{r, echo=FALSE, out.width="75%", fig.align='center'}
-knitr::include_graphics("figures/sheet-9/crossword.png")
-```
-
 ### 2a)
 ::: {.question data-latex=""}
 
-**Solve the crossword puzzle!**
+Draw a generalized suffix tree for the sequences $A=$`CCATG` and $B=$ `CATG`.
+::: 
 
-Horizontal:
+#### {.tabset}
 
-- 3. Added to DNA fragments during library preparation.
+##### Hide
 
-- 8. Illumina way of determining the order of nucleotides in a DNA strand. (3 words)
+##### Hint 1 
+::: {.answer data-latex=""}
 
-- 9. ChIP-Seq can be used for sequencing DNA regions that are bound by these.
+Concatenate the two sequences using a unique character for splitting. e.g.
+`CCATG#CATG$`.
 
-- 11. The alphabet of life.
+Dont forget to include suffix links!
+::: 
+##### Formulae
+::: {.answer data-latex=""}
 
-- 12. Formed by bridge-amplification on Illumina flow-cells.
+$sl(v) = w$
 
-- 13. Flowcell surface filled with these 2 different DNA molecules.
+$\overline{v} = cb$
 
-- 15. Measure to asses the quality of the identification of nucleobases generated by automated DNA sequencing. (3 words)
+$\overline{w} = b$
 
+$c: character, b: string$
 
-Vertical:
 
-- 1. Dideoxynucleosidetriphosphates (abbrev.)
+remember: $\overline{v}$ denotes the concatenation of all path labels from the root to v.
+::: 
+##### Solution
+::: {.answer data-latex=""}
 
-- 2. Process of determining positions of reads on the reference genome.
+```{r, echo=FALSE, out.width="100%", fig.align='center'}
+knitr::include_graphics("figures/sheet-8/suffix_tree_2.png")
+```
+::: 
+#### {-}
 
-- 4. Gene expression can be measured using this. (abbrev. hyph.)
+### 2b)
+::: {.question data-latex=""}
 
-- 5. The process of making many copies of a piece of DNA.
+Find the Maximal Unique Matches of the sequences $A=$`CCATG` and $B=$`CATG` using 
+the tree from A).
+::: 
+
+#### {.tabset}
 
-- 6. Found in pairs in DNA.
+##### Hide
 
-- 7. Chemical group attached to nucleotides to monitor incorporation into DNA.
+##### Solution
+::: {.answer data-latex=""}
 
-- 10. File format used to store sequence information.
+`CATG` is the only MUM as $\overline{v} =$ `CATG` has no suffix links pointing to
+it
+::: 
+#### {-}
 
-- 14. Breakthrough sequencing method (abbrev.)
 
+# Exercise 3
+
+### 3a)
+::: {.question data-latex=""}
+
+Draw a generalized suffix tree for the sequence $A=$`ACGCACGCG`.
 ::: 
 
 #### {.tabset}
@@ -117,40 +158,55 @@ Vertical:
 
 ##### Solution
 ::: {.answer data-latex=""}
-```{r, echo=FALSE, out.width="75%", fig.align='center'}
-knitr::include_graphics("figures/sheet-9/crossword_solved.png")
+
+```{r, echo=FALSE, out.width="100%", fig.align='center'}
+knitr::include_graphics("figures/sheet-8/suffix_tree_3.png")
 ```
 ::: 
-#### {-}
 
-# Exercise 3
+#### {-}
 
-#### {.tabset}
 
-### 3a)
+### 3b)
 ::: {.question data-latex=""}
-You want to determine how many reads $N$ are needed to achieve a coverage depth $C$ of 20X when sequencing reads for *Escherichia coli*.
 
-The length of the reads $L$ is 30nt and the *E. coli* genome $G$ is approximately 4.6 million bases long.
+Find all maximal pairs of length at least 2.
 ::: 
 
 #### {.tabset}
 
 ##### Hide
 
-##### Formula
+##### Solution
 ::: {.answer data-latex=""}
-$$
-N = \frac{C\times G}{L}
-$$
+
+`ACGC`: $(1,5,4)$
+
+`CG`: $(2,8,2), (6,8,2)$
 ::: 
+#### {-}
+
+
+### 3c)
+::: {.question data-latex=""}
+
+Why is `C`: $(2, 8, 1)$ not a maximal pair?
+
+::: 
+
+#### {.tabset}
+
+##### Hide
 
 ##### Solution
 ::: {.answer data-latex=""}
-$$
-N = \frac{20\times 4600000}{30} \approx 3066667 \text{ reads}
-$$
+
+It is not right maximal.
+This can be seen since `CG`: $(2, 8, 2)$ already includes the indices 2 and 8 with
+a longer match. 
+
 ::: 
+#### {-}