Add changes for 425843e

biocorecrg · Aug 21, 2024 · 8c11b69 · 8c11b69
1 parent 684ac43
commit 8c11b69
Show file tree

Hide file tree

Showing 11 changed files with 426 additions and 82 deletions.
diff --git a/1- Library_preparation.html b/1- Library_preparation.html
@@ -51,7 +51,11 @@
 <li class="toctree-l1 current"><a class="current reference internal" href="#">1 Library preparation</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="#main-causes-of-poor-quality-data">Main Causes of poor quality data</a></li>
 <li class="toctree-l2"><a class="reference internal" href="#id1">Library preparation</a></li>
-<li class="toctree-l2"><a class="reference internal" href="#library-preparation-bias">Library preparation bias</a></li>
+<li class="toctree-l2"><a class="reference internal" href="#library-preparation-bias">Library preparation bias</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="#dna-library-bias">DNA library bias</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#rna-library-bias">RNA library bias</a></li>
+</ul>
+</li>
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="2-%20Sequencing_technologies.html">2 Main Sequencing Technologies</a></li>
@@ -98,10 +102,8 @@ <h2>Main Causes of poor quality data<a class="headerlink" href="#main-causes-of-
 <h2>Library preparation<a class="headerlink" href="#id1" title="Link to this heading"></a></h2>
 <p>Selecting  a suitable NGS library according to the type of sample (cell type or tissue), and the downstream analysis (WES, WGS, ChipSeq, RNA-seq, …) is essential to gurantee the quality of the data and get de desired information for our research.
 In a nutshell, library is defined as a collection of nucleic acid (RNA or DNA) fragments of a defined lenght distribution with adapters attached.</p>
-<a class="reference internal image-reference" href="_images/library_prep_explanation_Van_Djik_2014.jpg"><img alt="_images/library_prep_explanation_Van_Djik_2014.jpg" class="align-center" src="_images/library_prep_explanation_Van_Djik_2014.jpg" style="width: 400px;" /></a>
-<p><em>source: https://doi.org/10.1016/j.yexcr.2014.01.008</em></p>
-<a class="reference internal image-reference" href="_images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png"><img alt="_images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png" class="align-center" src="_images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png" style="width: 400px;" /></a>
-<p><em>source: https://doi.org/10.1016/j.yexcr.2014.01.008</em></p>
+<a class="reference internal image-reference" href="_images/library_prep_explanation_Van_Djik_2014.jpg"><img alt="*source: https://doi.org/10.1016/j.yexcr.2014.01.008*" class="align-center" src="_images/library_prep_explanation_Van_Djik_2014.jpg" style="width: 400px;" /></a>
+<a class="reference internal image-reference" href="_images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png"><img alt="*source: https://doi.org/10.1016/j.yexcr.2014.01.008*" class="align-center" src="_images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png" style="width: 400px;" /></a>
 <p>Main steps of a Library Preparation Kit:</p>
 <ul class="simple">
 <li><p><strong>Fragmentation</strong></p></li>
@@ -135,14 +137,10 @@ <h2>Library preparation<a class="headerlink" href="#id1" title="Link to this hea
 <p>Check if DNA mets the quantity and quality requirements of the sequencing instrument. Assesss the quantity and size distribution of the library.</p>
 <div class="admonition note">
 <p class="admonition-title">Note</p>
-<blockquote>
-<div><p>RNA library preparation is more complex due to the risk of degradation and requires additional steps respect DNA:</p>
+<p>RNA library preparation is more complex due to the risk of degradation and requires additional steps respect DNA:</p>
 <ul class="simple">
 <li><p>Due that RNA is converted to cDNA, PCR-amplified libraries are necessary for many sequencing instruments.</p></li>
 <li><p>Most of the RNA-seq applications requires the removal of the ribosomal RNA (rRNA), comprising up to 90% of the total RNA.</p></li>
-</ul>
-</div></blockquote>
-<ul class="simple">
 <li><p>For especific isolation of mRNA transcripts, in addition to rRNA depletion, poly(A) must be done for selecting the RNAs containing a polyadenilated tail using oligo primers.</p></li>
 </ul>
 </div>
@@ -151,6 +149,79 @@ <h2>Library preparation<a class="headerlink" href="#id1" title="Link to this hea
 <h2>Library preparation bias<a class="headerlink" href="#library-preparation-bias" title="Link to this heading"></a></h2>
 <p>Among the different library preparation steps presented earlier, several biases can be introduced during the process.
 Here are presented the main biases introduced for either DNA or RNA, in each library preparation step and possible solutions to avoid them.</p>
+<section id="dna-library-bias">
+<h3>DNA library bias<a class="headerlink" href="#dna-library-bias" title="Link to this heading"></a></h3>
+<p>DNA Library preparation bias</p>
+<p><em>Source: http://dx.doi.org/10.1016/j.yexcr.2014.01.008</em></p>
+<p>Here are presented the the different steps of the DNA library preparation that have been implicated in bias introduction:</p>
+<ul class="simple">
+<li><p>Fragmentation</p></li>
+</ul>
+<p>Chromatin sonication for ChIP-seq has been shown to be non-random, with euchromatin being sheared more efficiently than heterochromatin.
+.. tip::
+To solve this it has been developed the double-fragmentation ChIP-seq protocol.</p>
+<ul class="simple">
+<li><p>Size Selection</p></li>
+</ul>
+<p>Agarose gel slices by heating to 50 ºC in chaotropic salt buffer decreased the representation of AT-rich sequences.
+.. tip::
+Simple solution to this problem is to melt the gel slices in the supplied buffer at room temperature (18–22 ºC), considerably reducing GC bias.</p>
+<ul class="simple">
+<li><p>PCR</p></li>
+</ul>
+<p>Introduce bias in sample composition, due to the fact that not all fragments in the mixture are amplified with the same efficiency.
+GC-neutral fragments are amplified more efficiently than GC-rich or AT-rich fragments, and as a result fragments with high AT- or GC content may become underrepresented or are completely lost during library preparation
+.. tip::
+- Ligate adapters that contain all necessary elements for bridge amplification on Illumina flowcells are preferred, eliminating the need for PCR to add these sequences afterwards. Nevertheless, requires relatively large quantities (41 mg) of input material.
+- In the extreme case of small input amount, the single cell,multiple displacement amplification (MDA) may be the preferred amplification method. MDA is an extremely powerful amplification method, allowing microgram quantities of DNA to be obtained from femtograms of starting material. For this reason, MDA has become the method of choice for whole genome amplification (WGA) from single cells
+- PCR additives have also been reported to reduce bias, such as betaine or tetramethylammonium chloride (TMAC) may help to further improve coverage of extremely GC-rich or AT-rich regions.
+- The best overall performing polymerase appears to be Kapa HiFi.</p>
+<p>For more information see the publication <a class="reference external" href="http://dx.doi.org/10.1016/j.yexcr.2014.01.008">Library preparation methods for next generation sequencing Tone down the bias</a>.</p>
+</section>
+<section id="rna-library-bias">
+<h3>RNA library bias<a class="headerlink" href="#rna-library-bias" title="Link to this heading"></a></h3>
+<p>RNA Library preparation bias
+<em>Source: https://doi.org/10.1155/2021/6647597</em></p>
+<p>On this field are presented the main source of bias in RNA-seq, and the solutions that would be implemented to reduce it.</p>
+<ul class="simple">
+<li><p><strong>Sample Preservation and Isolation</strong></p></li>
+</ul>
+<ol class="arabic simple">
+<li><p>Degradation of RNA: Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.</p></li>
+<li><p>RNA extraction: Use high concentrations of RNA samples or avoid TRIzol extraction altogether.</p></li>
+<li><p>Alien sequence contamination:</p></li>
+</ol>
+<p>#. Low-quality and/or low-quantity RNA samples: RNase H has been the best method for detecting low-qualityRNA and even could eﬀectively replace the standard RNA-seq method based on oligo (dT).
+For low-quantity RNA,the SMART and NuGEN approaches had lower duplicationrates and signiﬁcantly decreased the necessary amount ofstarting material compared to other methods.</p>
+<ul class="simple">
+<li><p><strong>Library Construction</strong></p></li>
+</ul>
+<p>1. mRNA enrichment bias: enrich for polyadenylated RNA transcripts with oligo (dT) primers have shown that this method remove all non-poly (A) RNAs, such a reolication-dependant histones and lncRNAs (lacking of polyA),
+or incomplete mRNAs. Targeting rRNA as depletion method will not limit to only mRNA molecules (also is more expensive). subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance, In contrast, exonuclease treatment tends to be less efficient in rRNA depletion
+#. RNA fragmentation bias: can introduce lenght biases or errors (propagated to later cycles), Studies have shown that methods that involve non speciﬁc restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods.
+#. Primer bias: deviation due to primer during PCR amplification could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells. authors propose a bioinformatics tool in the formo fare weighing scheme that adjusts for the bias and makes the distribution of the reads more uniform.
+#. Adapter ligation bias: due to substrate preferences of T4 RNA ligases, protocols that  uses a set of randomnucleotide adapters at the ligation boundary evade the capture of miRNAs. As a solution, several groups propose to randomize the 3’end of the 5’adapter and the 5’end of the 3’adapter. The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences             - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts
+#. PCR amplification bias: main source of artifacts and base composition bias in the process of library construction:
+#.Extremely AT/GC-Rich, fragments of GC-neutral can be ampliﬁed more than GC-rich or AT-rich fragments. Through the use of custom adapters, the samples</p>
+<blockquote>
+<div><p>without ampliﬁca-tion and ligation can be hybridized directly with the oligonu-cleotides on the ﬂowcell surface, thus avoiding the biases and duplicates of PCR.
+However, the ampliﬁcation-free method requires high sample input, which limits its widely used. The most eﬀective PCR enhancing additives currently used are betaine.
+It is an amino acid mimic that acts to balance the diﬀerential T m between AT and GC base pairs and has been eﬀec-tively used to improve the coverage of GC-rich templates</p>
+</div></blockquote>
+<ol class="arabic simple">
+<li><dl class="simple">
+<dt>Presence of tetramethylammonium chloride (TMAC). Their result showed that the TMAC can remarkably increase the ampliﬁcation of AT-rich regions in Kapa HiFi in the presence. Additionally,</dt><dd><p>a number of additives have been reported to play an important role in reducing the bias of PCR ampli-ﬁcation, including small amides such as formamide, small sulfoxides such as dimethyl sulfoxide (DMSO),
+or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT).</p>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt>PCR cyle: CR can exponentially amplify DNA/cDNA templates, thus leading to a signiﬁcant increase of ampliﬁcation bias with the number of PCR cycles. Therefore,</dt><dd><p>it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias</p>
+</dd>
+</dl>
+</li>
+</ol>
+</section>
 </section>
 </section>
 

diff --git a/2- Sequencing_technologies.html b/2- Sequencing_technologies.html
@@ -94,25 +94,25 @@ <h2>Short Reads sequencing (Illumina)<a class="headerlink" href="#short-reads-se
 </ol>
 <p>Adapter attached to the DNA fragment is used to hybridisation to the flowcell, subsequentlty PCR amplification (bridge amplification) generates a cluster of the same sequence fragment to amplify the signal
 when the nucleotide base is synthesized, thus obtaining a multiple cluster on a Flow Cell.</p>
-<a class="reference internal image-reference" href="_images/illumina_Lu_et_al_2016.png"><img alt="_images/illumina_Lu_et_al_2016.png" src="_images/illumina_Lu_et_al_2016.png" style="width: 400px;" /></a>
+<a class="reference internal image-reference" href="_images/illumina_Lu_et_al_2016.png"><img alt="_images/illumina_Lu_et_al_2016.png" class="align-center" src="_images/illumina_Lu_et_al_2016.png" style="width: 400px;" /></a>
 <p><em>Source: https://www.researchgate.net/publication/357946568_New_approaches_and_concepts_to_study_complex_microbial_communities</em></p>
 <ol class="arabic simple">
 <li><p>Sequencing</p></li>
 </ol>
 <p>On each cycle is incorporated one nucleotide to the template, it correspond to the read length (1’’ cycles equal to 100 bp read length).
 After imaging to determine which of the four colours was incorporated in each cluster of the flow cell.</p>
+<a class="reference internal image-reference" href="_images/single_vs_pair_end.png"><img alt="_images/single_vs_pair_end.png" class="align-center" src="_images/single_vs_pair_end.png" style="width: 400px;" /></a>
 <section id="single-end">
 <h3>Single end<a class="headerlink" href="#single-end" title="Link to this heading"></a></h3>
 <p>Correspond to the basis of SBS, where the nucleotides added to the template sequence is read from one end of the fragment.
-It’s more simple and effcient, due to reduce the the number of stemps in the library preparation.</p>
-<p>nevertheless, the quality of nucleotides decreases as the sequencing process progresses.</p>
+It’s more simple and effcient, due to reduce the the number of stemps in the library preparation. nevertheless, the quality of nucleotides decreases as the sequencing process progresses.</p>
 </section>
 <section id="paired-end">
 <h3>Paired end<a class="headerlink" href="#paired-end" title="Link to this heading"></a></h3>
 <p><em>source: https://systemsbiology.columbia.edu/genome-sequencing-defining-your-experiment#:~:text=Single%2Dend%20vs.&amp;text=In%20single%2Dend%20reading%2C%20the,opposite%20end%20of%20the%20fragment.</em></p>
 <p>During library preparation  are incorporated sequencing primers binding site at both ends of the DNA fragments.
-This allows to reading at one read, when it finiches this direction at the specified read lenght, then starts another round od reading from the opposite end of the fragment.
-It improves:</p>
+This allows to reading at one read, when it finishes this direction at the specified read lenght, then starts another round of reading from the opposite end of the fragment.</p>
+<p>It improves:</p>
 <ul class="simple">
 <li><p>The confidence of the sequence read</p></li>
 <li><p>The ability to identify the relative positions of various reads in the genome (much more efficient in resolve rearrangements such as insertions, deletions or inversions)</p></li>
@@ -130,6 +130,41 @@ <h3>Paired end<a class="headerlink" href="#paired-end" title="Link to this headi
 </section>
 <section id="long-read-sequencing-nanopore">
 <h2>Long read sequencing (Nanopore)<a class="headerlink" href="#long-read-sequencing-nanopore" title="Link to this heading"></a></h2>
+<p>Use flow cells which contain an array of tiny holes — nanopores (protein pore) — embedded in an electro-resistant membrane. Each nanopore corresponds to its own electrode connected
+to a channel and sensor chip, which measures the electric current that flows through the nanopore. When a molecule passes through a nanopore, the current is disrupted
+to produce a characteristic ‘squiggle’. The squiggle is then decoded using basecalling algorithms to determine the DNA or RNA sequence in real time.
+In an electrolytic solution, a constant voltage is applied to produce an ionic current through the nanopore such that negatively charged single-stranded DNA or RNA molecules
+are driven through the nanopore from the negatively charged ‘cis’ side to the positively charged ‘trans’ side. Translocation speed is controlled by a motor protein that ratchets
+the nucleic acid molecule through the nanopore in a step-wise manner. Changes in the ionic current during translocation correspond to the nucleotide sequence present in the sensing
+region and are decoded using computational algorithms, allowing real-time sequencing of single molecules.  In addition to controlling translocation speed, the motor protein has helicase activity,
+enabling double-stranded DNA or RNA–DNA duplexes to be unwound into single-stranded molecules that pass through the nanopore.</p>
+<a class="reference internal image-reference" href="_images/Nanopore_principle.png"><img alt="_images/Nanopore_principle.png" class="align-center" src="_images/Nanopore_principle.png" style="width: 400px;" /></a>
+<p>A basecaller translates raw signals into DNA sequence data (FASTQ). The basecaller uses a neural network to predict the most likely DNA sequence based on the raw signal data.</p>
+<div class="admonition seealso">
+<p class="admonition-title">See also</p>
+<blockquote>
+<div><blockquote>
+<div><p>See the <a class="reference external" href="https://www.youtube.com/watch?v=RcP85JHLmnI">Nanopore_sequencing_workflow</a> video by Oxford Nanopore Technologies to visualize the concepts of Nanopore sequencing.</p>
+</div></blockquote>
+</div></blockquote>
+<p>FASTQ format and Phred quality score</p>
+</div>
+<hr class="docutils" />
+<p>The raw data generated by the sequencer is stored in FASTQ format, which contains the sequence of nucleotides and their corresponding quality scores.
+It it’s divided in four lines:</p>
+<blockquote>
+<div><ol class="arabic simple">
+<li><p>Sequence identifier: starts with ‘&#64;’ and contains information about the read. Such as the instrument, run ID, flow cell ID, lane, tile, x, y coordinates, and read number.</p></li>
+<li><p>Sequence: the nucleotide sequence of the read.</p></li>
+<li><p>Quality identifier: starts with ‘+’ and contains the same information as the sequence identifier. Or it may be empty and in some cases is used for metadata.</p></li>
+<li><p>Quality scores: the Phred quality score for each base in the read. The Phred quality score is a measure of the quality of the base call, which is calculated as -10 * log10(P), where P is the probability of the base call being incorrect. The quality score is represented as an ASCII character, with a score of 0 represented by ‘!’, and a score of 41 represented by ‘J’. The higher the quality score, the more confident we are in the base call.</p></li>
+</ol>
+</div></blockquote>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>The &#64; symbol can not be used for count the number of reads, because it could also appear as a quality score symbol.</p>
+</div>
+<a class="reference internal image-reference" href="_images/fastq_format.png"><img alt="_images/fastq_format.png" class="align-center" src="_images/fastq_format.png" style="width: 400px;" /></a>
 </section>
 </section>