Selecting a suitable NGS library according to the type of sample (cell type or tissue), and the downstream analysis (WES, WGS, ChipSeq, RNA-seq, …) is essential to gurantee the quality of the data and get de desired information for our research.
In a nutshell, library is defined as a collection of nucleic acid (RNA or DNA) fragments of a defined lenght distribution with adapters attached.
Among the different library preparation steps presented earlier, several biases can be introduced during the process.
Here are presented the main biases introduced for either DNA or RNA, in each library preparation step and possible solutions to avoid them.
Here are presented the the different steps of the DNA library preparation that have been implicated in bias introduction:
+
+
Fragmentation
+
+
Chromatin sonication for ChIP-seq has been shown to be non-random, with euchromatin being sheared more efficiently than heterochromatin.
+.. tip::
+To solve this it has been developed the double-fragmentation ChIP-seq protocol.
+
+
Size Selection
+
+
Agarose gel slices by heating to 50 ºC in chaotropic salt buffer decreased the representation of AT-rich sequences.
+.. tip::
+Simple solution to this problem is to melt the gel slices in the supplied buffer at room temperature (18–22 ºC), considerably reducing GC bias.
+
+
PCR
+
+
Introduce bias in sample composition, due to the fact that not all fragments in the mixture are amplified with the same efficiency.
+GC-neutral fragments are amplified more efficiently than GC-rich or AT-rich fragments, and as a result fragments with high AT- or GC content may become underrepresented or are completely lost during library preparation
+.. tip::
+- Ligate adapters that contain all necessary elements for bridge amplification on Illumina flowcells are preferred, eliminating the need for PCR to add these sequences afterwards. Nevertheless, requires relatively large quantities (41 mg) of input material.
+- In the extreme case of small input amount, the single cell,multiple displacement amplification (MDA) may be the preferred amplification method. MDA is an extremely powerful amplification method, allowing microgram quantities of DNA to be obtained from femtograms of starting material. For this reason, MDA has become the method of choice for whole genome amplification (WGA) from single cells
+- PCR additives have also been reported to reduce bias, such as betaine or tetramethylammonium chloride (TMAC) may help to further improve coverage of extremely GC-rich or AT-rich regions.
+- The best overall performing polymerase appears to be Kapa HiFi.
On this field are presented the main source of bias in RNA-seq, and the solutions that would be implemented to reduce it.
+
+
Sample Preservation and Isolation
+
+
+
Degradation of RNA: Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.
+
RNA extraction: Use high concentrations of RNA samples or avoid TRIzol extraction altogether.
+
Alien sequence contamination:
+
+
#. Low-quality and/or low-quantity RNA samples: RNase H has been the best method for detecting low-qualityRNA and even could effectively replace the standard RNA-seq method based on oligo (dT).
+For low-quantity RNA,the SMART and NuGEN approaches had lower duplicationrates and significantly decreased the necessary amount ofstarting material compared to other methods.
+
+
Library Construction
+
+
1. mRNA enrichment bias: enrich for polyadenylated RNA transcripts with oligo (dT) primers have shown that this method remove all non-poly (A) RNAs, such a reolication-dependant histones and lncRNAs (lacking of polyA),
+or incomplete mRNAs. Targeting rRNA as depletion method will not limit to only mRNA molecules (also is more expensive). subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance, In contrast, exonuclease treatment tends to be less efficient in rRNA depletion
+#. RNA fragmentation bias: can introduce lenght biases or errors (propagated to later cycles), Studies have shown that methods that involve non specific restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods.
+#. Primer bias: deviation due to primer during PCR amplification could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells. authors propose a bioinformatics tool in the formo fare weighing scheme that adjusts for the bias and makes the distribution of the reads more uniform.
+#. Adapter ligation bias: due to substrate preferences of T4 RNA ligases, protocols that uses a set of randomnucleotide adapters at the ligation boundary evade the capture of miRNAs. As a solution, several groups propose to randomize the 3’end of the 5’adapter and the 5’end of the 3’adapter. The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts
+#. PCR amplification bias: main source of artifacts and base composition bias in the process of library construction:
+#.Extremely AT/GC-Rich, fragments of GC-neutral can be amplified more than GC-rich or AT-rich fragments. Through the use of custom adapters, the samples
+
+
without amplifica-tion and ligation can be hybridized directly with the oligonu-cleotides on the flowcell surface, thus avoiding the biases and duplicates of PCR.
+However, the amplification-free method requires high sample input, which limits its widely used. The most effective PCR enhancing additives currently used are betaine.
+It is an amino acid mimic that acts to balance the differential T m between AT and GC base pairs and has been effec-tively used to improve the coverage of GC-rich templates
+
+
+
+
Presence of tetramethylammonium chloride (TMAC). Their result showed that the TMAC can remarkably increase the amplification of AT-rich regions in Kapa HiFi in the presence. Additionally,
a number of additives have been reported to play an important role in reducing the bias of PCR ampli-fication, including small amides such as formamide, small sulfoxides such as dimethyl sulfoxide (DMSO),
+or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT).
+
+
+
+
+
PCR cyle: CR can exponentially amplify DNA/cDNA templates, thus leading to a significant increase of amplification bias with the number of PCR cycles. Therefore,
it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias
On each cycle is incorporated one nucleotide to the template, it correspond to the read length (1’’ cycles equal to 100 bp read length).
After imaging to determine which of the four colours was incorporated in each cluster of the flow cell.
Correspond to the basis of SBS, where the nucleotides added to the template sequence is read from one end of the fragment.
-It’s more simple and effcient, due to reduce the the number of stemps in the library preparation.
-
nevertheless, the quality of nucleotides decreases as the sequencing process progresses.
+It’s more simple and effcient, due to reduce the the number of stemps in the library preparation. nevertheless, the quality of nucleotides decreases as the sequencing process progresses.
During library preparation are incorporated sequencing primers binding site at both ends of the DNA fragments.
-This allows to reading at one read, when it finiches this direction at the specified read lenght, then starts another round od reading from the opposite end of the fragment.
-It improves:
+This allows to reading at one read, when it finishes this direction at the specified read lenght, then starts another round of reading from the opposite end of the fragment.
+
It improves:
The confidence of the sequence read
The ability to identify the relative positions of various reads in the genome (much more efficient in resolve rearrangements such as insertions, deletions or inversions)
Use flow cells which contain an array of tiny holes — nanopores (protein pore) — embedded in an electro-resistant membrane. Each nanopore corresponds to its own electrode connected
+to a channel and sensor chip, which measures the electric current that flows through the nanopore. When a molecule passes through a nanopore, the current is disrupted
+to produce a characteristic ‘squiggle’. The squiggle is then decoded using basecalling algorithms to determine the DNA or RNA sequence in real time.
+In an electrolytic solution, a constant voltage is applied to produce an ionic current through the nanopore such that negatively charged single-stranded DNA or RNA molecules
+are driven through the nanopore from the negatively charged ‘cis’ side to the positively charged ‘trans’ side. Translocation speed is controlled by a motor protein that ratchets
+the nucleic acid molecule through the nanopore in a step-wise manner. Changes in the ionic current during translocation correspond to the nucleotide sequence present in the sensing
+region and are decoded using computational algorithms, allowing real-time sequencing of single molecules. In addition to controlling translocation speed, the motor protein has helicase activity,
+enabling double-stranded DNA or RNA–DNA duplexes to be unwound into single-stranded molecules that pass through the nanopore.
+
+
A basecaller translates raw signals into DNA sequence data (FASTQ). The basecaller uses a neural network to predict the most likely DNA sequence based on the raw signal data.
+
+
See also
+
+
+
See the Nanopore_sequencing_workflow video by Oxford Nanopore Technologies to visualize the concepts of Nanopore sequencing.
+
+
+
FASTQ format and Phred quality score
+
+
+
The raw data generated by the sequencer is stored in FASTQ format, which contains the sequence of nucleotides and their corresponding quality scores.
+It it’s divided in four lines:
+
+
+
Sequence identifier: starts with ‘@’ and contains information about the read. Such as the instrument, run ID, flow cell ID, lane, tile, x, y coordinates, and read number.
+
Sequence: the nucleotide sequence of the read.
+
Quality identifier: starts with ‘+’ and contains the same information as the sequence identifier. Or it may be empty and in some cases is used for metadata.
+
Quality scores: the Phred quality score for each base in the read. The Phred quality score is a measure of the quality of the base call, which is calculated as -10 * log10(P), where P is the probability of the base call being incorrect. The quality score is represented as an ASCII character, with a score of 0 represented by ‘!’, and a score of 41 represented by ‘J’. The higher the quality score, the more confident we are in the base call.
+
+
+
+
Note
+
The @ symbol can not be used for count the number of reads, because it could also appear as a quality score symbol.
+
+
diff --git a/3- Quality Control and Preprocessing.html b/3- Quality Control and Preprocessing.html
new file mode 100644
index 0000000..e0cfeec
--- /dev/null
+++ b/3- Quality Control and Preprocessing.html
@@ -0,0 +1,153 @@
+
+
+
+
+
+
+ 3 Quality Control and Preprocessing — NGS-QC-Course documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Quality control of the reads contained in the fastq files needs to be check, in order to determine
+if the reads could be used for further analysis. the main tools used for QC in Illumina reads are FASTQC and FASTQ-Screen.
Is a tool developed to check failures in the reads produced either by the sequencing machine or during library preparation.
+the extensions supported are:
+
+
+
FASTQ
+
Casava FASTQ files
+
Colorspace fastq
+
Gzip compressed FASTQ (.fastq.gz)
+
SAM
+
BAM
+
SAM/BAM Mapped only (normally used for colorspace data)
+
+
+
The html report generated for each file its divided in the following sections:
+
+
+
Basic Statistics: display the information related with the file, number and leght of the sequences, and overall %GC.
+
+
Per base sequence quality: shows how the quality score (y axis) varys throughout the sequence reads (x axis). For each position a BoxWhisker is displayed, the red line represents the median and the blue the mean. Commonly the quality score tend to decrease at the end of the reads, because the polymerase tends to make more mistakes as the read progresses.
is the median os any base is less than 25 a warning will arise.
+
+
+
+
Per sequence quality score:
+
Per base sequence content
+
Per base GC content
+
Per sequence GC content
+
Per Base N content
+
Sequence Lenght Distribution
+
Duplicate Sequences
+
Overrepresented sequences
+
Overrepresented kmers
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/_images/Nanopore_principle.png b/_images/Nanopore_principle.png
new file mode 100644
index 0000000..7e722be
Binary files /dev/null and b/_images/Nanopore_principle.png differ
diff --git a/_images/fastq_format.png b/_images/fastq_format.png
new file mode 100644
index 0000000..30d3c67
Binary files /dev/null and b/_images/fastq_format.png differ
diff --git a/_images/single_vs_pair_end.png b/_images/single_vs_pair_end.png
new file mode 100644
index 0000000..f362150
Binary files /dev/null and b/_images/single_vs_pair_end.png differ
diff --git a/_sources/1- Library_preparation.rst.txt b/_sources/1- Library_preparation.rst.txt
index 5cd66d8..3fbc78a 100644
--- a/_sources/1- Library_preparation.rst.txt
+++ b/_sources/1- Library_preparation.rst.txt
@@ -25,15 +25,15 @@ In a nutshell, library is defined as a collection of nucleic acid (RNA or DNA) f
.. image:: images/library_prep_explanation_Van_Djik_2014.jpg
:width: 400
:align: center
+ :alt: *source: https://doi.org/10.1016/j.yexcr.2014.01.008*
-*source: https://doi.org/10.1016/j.yexcr.2014.01.008*
.. image:: images/protocol_RNA-seq_library_bias_vanDjik_etal_2014.png
:width: 400
:align: center
+ :alt: *source: https://doi.org/10.1016/j.yexcr.2014.01.008*
-*source: https://doi.org/10.1016/j.yexcr.2014.01.008*
Main steps of a Library Preparation Kit:
@@ -73,8 +73,8 @@ Check if DNA mets the quantity and quality requirements of the sequencing instru
RNA library preparation is more complex due to the risk of degradation and requires additional steps respect DNA:
- Due that RNA is converted to cDNA, PCR-amplified libraries are necessary for many sequencing instruments.
- - Most of the RNA-seq applications requires the removal of the ribosomal RNA (rRNA), comprising up to 90% of the total RNA.
- - For especific isolation of mRNA transcripts, in addition to rRNA depletion, poly(A) must be done for selecting the RNAs containing a polyadenilated tail using oligo primers.
+ - Most of the RNA-seq applications requires the removal of the ribosomal RNA (rRNA), comprising up to 90% of the total RNA.
+ - For especific isolation of mRNA transcripts, in addition to rRNA depletion, poly(A) must be done for selecting the RNAs containing a polyadenilated tail using oligo primers.
Library preparation bias
@@ -83,75 +83,73 @@ Library preparation bias
Among the different library preparation steps presented earlier, several biases can be introduced during the process.
Here are presented the main biases introduced for either DNA or RNA, in each library preparation step and possible solutions to avoid them.
-.. tabs::
+DNA library bias
+----------------
- .. tab:: DNA library bias
+DNA Library preparation bias
- DNA Library preparation bias
+*Source: http://dx.doi.org/10.1016/j.yexcr.2014.01.008*
- *Source: http://dx.doi.org/10.1016/j.yexcr.2014.01.008*
+Here are presented the the different steps of the DNA library preparation that have been implicated in bias introduction:
- Here are presented the the different steps of the DNA library preparation that have been implicated in bias introduction:
+- Fragmentation
+Chromatin sonication for ChIP-seq has been shown to be non-random, with euchromatin being sheared more efficiently than heterochromatin.
+.. tip::
+To solve this it has been developed the double-fragmentation ChIP-seq protocol.
- #. Fragmentation
- Chromatin sonication for ChIP-seq has been shown to be non-random, with euchromatin being sheared more efficiently than heterochromatin.
- .. tip::
- To solve this it has been developed the double-fragmentation ChIP-seq protocol.
+- Size Selection
+Agarose gel slices by heating to 50 ºC in chaotropic salt buffer decreased the representation of AT-rich sequences.
+.. tip::
+Simple solution to this problem is to melt the gel slices in the supplied buffer at room temperature (18–22 ºC), considerably reducing GC bias.
- #. Size Selection
- Agarose gel slices by heating to 50 ºC in chaotropic salt buffer decreased the representation of AT-rich sequences.
- .. tip::
- Simple solution to this problem is to melt the gel slices in the supplied buffer at room temperature (18–22 ºC), considerably reducing GC bias.
+- PCR
+Introduce bias in sample composition, due to the fact that not all fragments in the mixture are amplified with the same efficiency.
+GC-neutral fragments are amplified more efficiently than GC-rich or AT-rich fragments, and as a result fragments with high AT- or GC content may become underrepresented or are completely lost during library preparation
+.. tip::
+- Ligate adapters that contain all necessary elements for bridge amplification on Illumina flowcells are preferred, eliminating the need for PCR to add these sequences afterwards. Nevertheless, requires relatively large quantities (41 mg) of input material.
+- In the extreme case of small input amount, the single cell,multiple displacement amplification (MDA) may be the preferred amplification method. MDA is an extremely powerful amplification method, allowing microgram quantities of DNA to be obtained from femtograms of starting material. For this reason, MDA has become the method of choice for whole genome amplification (WGA) from single cells
+- PCR additives have also been reported to reduce bias, such as betaine or tetramethylammonium chloride (TMAC) may help to further improve coverage of extremely GC-rich or AT-rich regions.
+- The best overall performing polymerase appears to be Kapa HiFi.
- #. PCR
- Introduce bias in sample composition, due to the fact that not all fragments in the mixture are amplified with the same efficiency.
- GC-neutral fragments are amplified more efficiently than GC-rich or AT-rich fragments, and as a result fragments with high AT- or GC content may become underrepresented or are completely lost during library preparation
- .. tip::
- - Ligate adapters that contain all necessary elements for bridge amplification on Illumina flowcells are preferred, eliminating the need for PCR to add these sequences afterwards. Nevertheless, requires relatively large quantities (41 mg) of input material.
- - In the extreme case of small input amount, the single cell,multiple displacement amplification (MDA) may be the preferred amplification method. MDA is an extremely powerful amplification method, allowing microgram quantities of DNA to be obtained from femtograms of starting material. For this reason, MDA has become the method of choice for whole genome amplification (WGA) from single cells
- - PCR additives have also been reported to reduce bias, such as betaine or tetramethylammonium chloride (TMAC) may help to further improve coverage of extremely GC-rich or AT-rich regions.
- - The best overall performing polymerase appears to be Kapa HiFi.
-
- .. seealso::
- For more information see the publication `Library preparation methods for next generation sequencing Tone down the bias `_.
+.. seealso::
+For more information see the publication `Library preparation methods for next generation sequencing Tone down the bias `_.
+RNA library bias
+----------------
+
+RNA Library preparation bias
+*Source: https://doi.org/10.1155/2021/6647597*
+
+On this field are presented the main source of bias in RNA-seq, and the solutions that would be implemented to reduce it.
+
+- **Sample Preservation and Isolation**
+
+1. Degradation of RNA: Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.
+#. RNA extraction: Use high concentrations of RNA samples or avoid TRIzol extraction altogether.
+#. Alien sequence contamination:
+#. Low-quality and/or low-quantity RNA samples: RNase H has been the best method for detecting low-qualityRNA and even could effectively replace the standard RNA-seq method based on oligo (dT).
+For low-quantity RNA,the SMART and NuGEN approaches had lower duplicationrates and significantly decreased the necessary amount ofstarting material compared to other methods.
+
+- **Library Construction**
+
+1. mRNA enrichment bias: enrich for polyadenylated RNA transcripts with oligo (dT) primers have shown that this method remove all non-poly (A) RNAs, such a reolication-dependant histones and lncRNAs (lacking of polyA),
+or incomplete mRNAs. Targeting rRNA as depletion method will not limit to only mRNA molecules (also is more expensive). subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance, In contrast, exonuclease treatment tends to be less efficient in rRNA depletion
+#. RNA fragmentation bias: can introduce lenght biases or errors (propagated to later cycles), Studies have shown that methods that involve non specific restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods.
+#. Primer bias: deviation due to primer during PCR amplification could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells. authors propose a bioinformatics tool in the formo fare weighing scheme that adjusts for the bias and makes the distribution of the reads more uniform.
+#. Adapter ligation bias: due to substrate preferences of T4 RNA ligases, protocols that uses a set of randomnucleotide adapters at the ligation boundary evade the capture of miRNAs. As a solution, several groups propose to randomize the 3'end of the 5'adapter and the 5'end of the 3'adapter. The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts
+#. PCR amplification bias: main source of artifacts and base composition bias in the process of library construction:
+#.Extremely AT/GC-Rich, fragments of GC-neutral can be amplified more than GC-rich or AT-rich fragments. Through the use of custom adapters, the samples
+ without amplifica-tion and ligation can be hybridized directly with the oligonu-cleotides on the flowcell surface, thus avoiding the biases and duplicates of PCR.
+ However, the amplification-free method requires high sample input, which limits its widely used. The most effective PCR enhancing additives currently used are betaine.
+ It is an amino acid mimic that acts to balance the differential T m between AT and GC base pairs and has been effec-tively used to improve the coverage of GC-rich templates
+#. Presence of tetramethylammonium chloride (TMAC). Their result showed that the TMAC can remarkably increase the amplification of AT-rich regions in Kapa HiFi in the presence. Additionally,
+ a number of additives have been reported to play an important role in reducing the bias of PCR ampli-fication, including small amides such as formamide, small sulfoxides such as dimethyl sulfoxide (DMSO),
+ or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT).
+#. PCR cyle: CR can exponentially amplify DNA/cDNA templates, thus leading to a significant increase of amplification bias with the number of PCR cycles. Therefore,
+ it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias
- .. tab:: RNA library bias
-
- RNA Library preparation bias
- *Source: https://doi.org/10.1155/2021/6647597*
-
- On this field are presented the main source of bias in RNA-seq, and the solutions that would be implemented to reduce it.
-
- #. **Sample Preservation and Isolation**
-
- - Degradation of RNA: Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.
- - RNA extraction: Use high concentrations of RNA samples or avoid TRIzol extraction altogether.
- - Alien sequence contamination:
- - Low-quality and/or low-quantity RNA samples: RNase H has been the best method for detecting low-qualityRNA and even could effectively replace the standard RNA-seq method based on oligo (dT).
- For low-quantity RNA,the SMART and NuGEN approaches had lower duplicationrates and significantly decreased the necessary amount ofstarting material compared to other methods.
-
- #. **Library Construction**
-
- - mRNA enrichment bias: enrich for polyadenylated RNA transcripts with oligo (dT) primers have shown that this method remove all non-poly (A) RNAs, such a reolication-dependant histones and lncRNAs (lacking of polyA),
- or incomplete mRNAs. Targeting rRNA as depletion method will not limit to only mRNA molecules (also is more expensive). subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance, In contrast, exonuclease treatment tends to be less efficient in rRNA depletion
- - RNA fragmentation bias: can introduce lenght biases or errors (propagated to later cycles), Studies have shown that methods that involve non specific restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods.
- - Primer bias: deviation due to primer during PCR amplification could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells. authors propose a bioinformatics tool in the formo fare weighing scheme that adjusts for the bias and makes the distribution of the reads more uniform.
- - Adapter ligation bias: due to substrate preferences of T4 RNA ligases, protocols that uses a set of randomnucleotide adapters at the ligation boundary evade the capture of miRNAs. As a solution, several groups propose to randomize the 3'end of the 5'adapter and the 5'end of the 3'adapter. The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts
- - PCR amplification bias: main source of artifactsand base composition bias in the process of library construc-tion,
- Extremely AT/GC-Rich, fragments of GC-neutral can be amplified more thanGC-rich or AT-rich fragments. Throughthe use of custom adapters,
- the samples without amplifica-tion and ligation can be hybridized directly with the oligonu-cleotides on the flowcell surface, thus avoiding the biases andduplicates of PCR. However,
- the amplification-free methodrequires high sample input, which limits its widely used. The mosteffective PCR enhancing additives currently used are betaine[50].
- It is an amino acid mimic that acts to balance the differ-ential T m between AT and GC base pairs and has been effec-tively used to improve the coverage of GC-rich templates
- presence of tetramethylammonium chloride (TMAC). Theirresult showed that the TMAC can remarkably increase theamplification of AT-rich regions in Kapa HiFi in the pres-ence. Additionally,
- a number of additives have been reportedto play an important role in reducing the bias of PCR ampli-fication, including small amides such as formamide, smallsulfoxides such as dimethyl sulfoxide (DMSO),
- or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT) [50].
- PCR cyle: CR can exponentiallyamplify DNA/cDNA templates, thus leading to a significantincrease of amplification bias with the number of PCR cycles[51]. Therefore,
- it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias
- - Machine failure
-
diff --git a/_sources/2- Sequencing_technologies.rst.txt b/_sources/2- Sequencing_technologies.rst.txt
index 9f145d6..fe75a96 100644
--- a/_sources/2- Sequencing_technologies.rst.txt
+++ b/_sources/2- Sequencing_technologies.rst.txt
@@ -17,6 +17,7 @@ when the nucleotide base is synthesized, thus obtaining a multiple cluster on a
.. image:: images/illumina_Lu_et_al_2016.png
:width: 400
+ :align: center
*Source: https://www.researchgate.net/publication/357946568_New_approaches_and_concepts_to_study_complex_microbial_communities*
@@ -26,13 +27,15 @@ when the nucleotide base is synthesized, thus obtaining a multiple cluster on a
On each cycle is incorporated one nucleotide to the template, it correspond to the read length (1'' cycles equal to 100 bp read length).
After imaging to determine which of the four colours was incorporated in each cluster of the flow cell.
+.. image:: images/single_vs_pair_end.png
+ :width: 400
+ :align: center
+
Single end
----------
Correspond to the basis of SBS, where the nucleotides added to the template sequence is read from one end of the fragment.
-It's more simple and effcient, due to reduce the the number of stemps in the library preparation.
-
-nevertheless, the quality of nucleotides decreases as the sequencing process progresses.
+It's more simple and effcient, due to reduce the the number of stemps in the library preparation. nevertheless, the quality of nucleotides decreases as the sequencing process progresses.
Paired end
@@ -41,7 +44,8 @@ Paired end
*source: https://systemsbiology.columbia.edu/genome-sequencing-defining-your-experiment#:~:text=Single%2Dend%20vs.&text=In%20single%2Dend%20reading%2C%20the,opposite%20end%20of%20the%20fragment.*
During library preparation are incorporated sequencing primers binding site at both ends of the DNA fragments.
-This allows to reading at one read, when it finiches this direction at the specified read lenght, then starts another round od reading from the opposite end of the fragment.
+This allows to reading at one read, when it finishes this direction at the specified read lenght, then starts another round of reading from the opposite end of the fragment.
+
It improves:
- The confidence of the sequence read
@@ -60,3 +64,43 @@ For more information
Long read sequencing (Nanopore)
========================
+
+Use flow cells which contain an array of tiny holes — nanopores (protein pore) — embedded in an electro-resistant membrane. Each nanopore corresponds to its own electrode connected
+to a channel and sensor chip, which measures the electric current that flows through the nanopore. When a molecule passes through a nanopore, the current is disrupted
+to produce a characteristic ‘squiggle’. The squiggle is then decoded using basecalling algorithms to determine the DNA or RNA sequence in real time.
+In an electrolytic solution, a constant voltage is applied to produce an ionic current through the nanopore such that negatively charged single-stranded DNA or RNA molecules
+are driven through the nanopore from the negatively charged ‘cis’ side to the positively charged ‘trans’ side. Translocation speed is controlled by a motor protein that ratchets
+the nucleic acid molecule through the nanopore in a step-wise manner. Changes in the ionic current during translocation correspond to the nucleotide sequence present in the sensing
+region and are decoded using computational algorithms, allowing real-time sequencing of single molecules. In addition to controlling translocation speed, the motor protein has helicase activity,
+enabling double-stranded DNA or RNA–DNA duplexes to be unwound into single-stranded molecules that pass through the nanopore.
+
+.. image:: images/Nanopore_principle.png
+ :width: 400
+ :align: center
+
+A basecaller translates raw signals into DNA sequence data (FASTQ). The basecaller uses a neural network to predict the most likely DNA sequence based on the raw signal data.
+
+.. seealso::
+ .. _Nanopore_sequencing_workflow: https://www.youtube.com/watch?v=RcP85JHLmnI
+
+ See the Nanopore_sequencing_workflow_ video by Oxford Nanopore Technologies to visualize the concepts of Nanopore sequencing.
+
+
+ FASTQ format and Phred quality score
+=====================================
+
+The raw data generated by the sequencer is stored in FASTQ format, which contains the sequence of nucleotides and their corresponding quality scores.
+It it's divided in four lines:
+ 1. Sequence identifier: starts with '@' and contains information about the read. Such as the instrument, run ID, flow cell ID, lane, tile, x, y coordinates, and read number.
+ 2. Sequence: the nucleotide sequence of the read.
+ 3. Quality identifier: starts with '+' and contains the same information as the sequence identifier. Or it may be empty and in some cases is used for metadata.
+ 4. Quality scores: the Phred quality score for each base in the read. The Phred quality score is a measure of the quality of the base call, which is calculated as -10 * log10(P), where P is the probability of the base call being incorrect. The quality score is represented as an ASCII character, with a score of 0 represented by '!', and a score of 41 represented by 'J'. The higher the quality score, the more confident we are in the base call.
+
+.. Note::
+ The @ symbol can not be used for count the number of reads, because it could also appear as a quality score symbol.
+
+.. image:: images/fastq_format.png
+ :width: 400
+ :align: center
+
+
\ No newline at end of file
diff --git a/_sources/3- Quality Control and Preprocessing.rst.txt b/_sources/3- Quality Control and Preprocessing.rst.txt
new file mode 100644
index 0000000..76db07f
--- /dev/null
+++ b/_sources/3- Quality Control and Preprocessing.rst.txt
@@ -0,0 +1,43 @@
+.. _Sequencing_technologies-page:
+
+***********************************
+3 Quality Control and Preprocessing
+***********************************
+
+Illumina
+===========================
+
+Quality Control
+---------------
+
+Quality control of the reads contained in the fastq files needs to be check, in order to determine
+if the reads could be used for further analysis. the main tools used for QC in Illumina reads are FASTQC and FASTQ-Screen.
+
+
+FASTQC
+~~~~~~
+
+Is a tool developed to check failures in the reads produced either by the sequencing machine or during library preparation.
+the extensions supported are:
+ - FASTQ
+ - Casava FASTQ files
+ - Colorspace fastq
+ - Gzip compressed FASTQ (.fastq.gz)
+ - SAM
+ - BAM
+ - SAM/BAM Mapped only (normally used for colorspace data)
+
+The html report generated for each file its divided in the following sections:
+
+ #. **Basic Statistics**: display the information related with the file, number and leght of the sequences, and overall %GC.
+ #. **Per base sequence quality**: shows how the quality score (y axis) varys throughout the sequence reads (x axis). For each position a BoxWhisker is displayed, the red line represents the median and the blue the mean. Commonly the quality score tend to decrease at the end of the reads, because the polymerase tends to make more mistakes as the read progresses.
+ is the median os any base is less than 25 a warning will arise.
+ #. **Per sequence quality score**:
+ #. Per base sequence content
+ #. Per base GC content
+ #. Per sequence GC content
+ #. Per Base N content
+ #. Sequence Lenght Distribution
+ #. Duplicate Sequences
+ #. Overrepresented sequences
+ #. Overrepresented kmers
\ No newline at end of file
diff --git a/objects.inv b/objects.inv
index de88f07..9d6c33a 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/searchindex.js b/searchindex.js
index db8f35d..604f55e 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1 Library preparation": [[0, null]], "2 Main Sequencing Technologies": [[1, null]], "About the course": [[2, null]], "Dates, time, location": [[2, "dates-time-location"]], "Learning objectives": [[2, "learning-objectives"]], "Library preparation": [[0, "id1"]], "Library preparation bias": [[0, "library-preparation-bias"]], "Long read sequencing (Nanopore)": [[1, "long-read-sequencing-nanopore"]], "Main Causes of poor quality data": [[0, "main-causes-of-poor-quality-data"]], "Main instructors:": [[2, "id1"]], "NGS Quality Control": [[3, null]], "Outline": [[2, "outline"]], "Paired end": [[1, "paired-end"]], "Prerequisite / technical requirements": [[2, "prerequisite-technical-requirements"]], "Program": [[2, "program"]], "Short Reads sequencing (Illumina)": [[1, "short-reads-sequencing-illumina"]], "Single end": [[1, "single-end"]]}, "docnames": ["1- Library_preparation", "2- Sequencing_technologies", "about", "index"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["1- Library_preparation.rst", "2- Sequencing_technologies.rst", "about.rst", "index.rst"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1], "008": 0, "01": 0, "1": [1, 3], "10": 0, "100": 1, "1016": 0, "2": 3, "2014": 0, "20end": 1, "20fragment": 1, "20of": 1, "20read": 1, "20singl": 1, "20the": 1, "20v": 1, "2c": 1, "2dend": 1, "3": 0, "357946568_new_approaches_and_concepts_to_study_complex_microbial_commun": 1, "5": 0, "90": 0, "A": 0, "Being": 2, "For": [0, 1], "In": [0, 1], "It": [0, 1], "On": 1, "The": 1, "These": 0, "With": 0, "abil": 1, "about": 3, "accord": 0, "acid": 0, "ad": 1, "adapt": [0, 1, 2], "addit": 0, "addition": 0, "adenin": 0, "after": 1, "agaros": 0, "aggreg": 2, "agil": 0, "allow": [0, 1], "alongsid": 1, "also": 0, "among": 0, "amplif": [0, 1], "amplifi": [0, 1], "an": 0, "analysi": 0, "anoth": 1, "appli": 0, "applic": 0, "appropi": 0, "ar": [0, 1], "artifact": 0, "ass": 0, "assembli": 1, "assesss": 0, "attach": [0, 1], "avoid": [0, 2], "barcod": 0, "base": [0, 1, 2], "basi": 1, "bead": 0, "befor": [0, 1], "being": 1, "bia": 2, "bias": 0, "bind": 1, "blunt": 0, "both": [0, 1], "bp": 1, "bridg": 1, "broken": 0, "can": [0, 1], "capabl": 0, "case": 0, "cdna": 0, "cell": [0, 1], "chain": 0, "check": 0, "chemic": [0, 1], "chipseq": 0, "chosen": 0, "cli": 2, "cluster": 1, "collect": 0, "colour": 1, "columbia": 1, "comfort": 2, "command": 2, "compat": 0, "complex": 0, "compris": 0, "concentr": 0, "concept": 1, "confid": 1, "consist": 1, "consum": 1, "contain": 0, "contamin": 0, "content": 3, "control": [0, 2], "convert": 0, "correspond": 1, "could": 0, "coupl": 1, "cours": 3, "cozzuto": 2, "creat": 0, "critic": 0, "cutadapt": 2, "cycl": 1, "data": 2, "de": 0, "decreas": 1, "defin": [0, 1], "degrad": 0, "delet": 1, "depend": 0, "deplet": 0, "desir": 0, "detect": 0, "determin": 1, "differ": [0, 2], "direct": 1, "distribut": 0, "dna": [0, 1, 2], "doi": 0, "done": 0, "downstream": 0, "due": [0, 1], "duplic": 0, "dure": [0, 1], "each": [0, 1, 2], "earlier": 0, "edu": 1, "effcient": 1, "effici": 1, "either": 0, "electrophoresi": 0, "enabl": 0, "enbal": 0, "end": 0, "ensur": 0, "environ": 2, "enzimat": 0, "enzym": 0, "equal": 1, "especif": 0, "essenti": 0, "execut": 2, "expens": 1, "experi": 1, "extract": 0, "fasq": 2, "fastp": 2, "fastq": 2, "fastqc": 2, "finich": 1, "flow": [0, 1], "flowcel": 1, "fluoresc": 1, "fluoromet": 0, "format": 2, "four": 1, "fragment": [0, 1], "free": 0, "frequent": 0, "from": 1, "gc": 0, "gel": 0, "gener": [0, 1], "genom": [0, 1], "get": 0, "git": 2, "github": 3, "good": 0, "group": 1, "gurante": 0, "hand": 1, "here": 0, "hermoso": 2, "high": 0, "hinder": 0, "how": 2, "http": [0, 1], "hybridis": 1, "i": [0, 1], "identifi": [0, 1], "illumina": [0, 2], "illumina_sequencing_by_synthesis_workflow": 1, "imag": 1, "improv": 1, "incorpor": 1, "index": 0, "inform": [0, 1], "input": 0, "insert": 1, "instrument": 0, "integr": 0, "interact": 0, "interfac": 2, "interpret": 2, "introduc": [0, 2], "introductori": 2, "invers": 1, "isol": 0, "j": 0, "julia": 2, "kit": 0, "know": 2, "leav": 0, "lenght": [0, 1], "length": 1, "librari": [1, 2, 3], "ligat": 0, "like": 0, "line": 2, "linux": 2, "long": 2, "low": 2, "lower": 0, "luca": 2, "magnet": 0, "mai": [0, 2], "main": 3, "mani": 0, "marker": 1, "mean": 0, "measur": 0, "mediat": 1, "met": 0, "method": 0, "more": [0, 1], "most": 0, "mrna": 0, "much": 1, "multipl": 1, "multiplex": 0, "multiqc": 2, "must": 0, "nanodrop": 0, "nanoplot": 2, "nanopor": 2, "natur": 0, "necessari": 0, "net": 1, "nevertheless": 1, "next": 1, "ng": [0, 2], "nucleic": 0, "nucleotid": 1, "number": 1, "nutshel": 0, "obtain": [1, 2], "od": 1, "offer": 2, "often": 0, "ohter": 1, "oligo": 0, "oligonucleotid": 0, "one": 1, "onli": 0, "opposit": 1, "optic": 1, "option": 0, "org": 0, "our": 0, "overhang": 0, "paramet": 2, "particip": 2, "paus": 1, "pcr": [0, 1], "perfom": 0, "perform": 0, "phosphoryl": 0, "physic": 0, "platform": 0, "poli": 0, "polyadenil": 0, "polymeras": [0, 1], "ponomarenko": 2, "pool": 0, "popular": 0, "posit": 1, "possibl": 0, "prefer": 0, "prepar": [1, 2, 3], "preprocess": 2, "present": 0, "primer": [0, 1], "proceess": 0, "process": [0, 1], "produc": 0, "product": 0, "progress": 1, "proper": 0, "protocol": 0, "public": 1, "purif": 0, "puriti": 0, "qc": 2, "qualiti": [1, 2], "quantif": 0, "quantiti": 0, "qubit": 0, "raw": 2, "reaction": 0, "read": 2, "readili": 0, "rearrang": 1, "recognis": 0, "recommend": 2, "record": 1, "reduc": 1, "region": 1, "rel": 1, "remov": [0, 1, 2], "repair": 0, "repetit": 1, "report": 2, "repositori": 3, "requir": 0, "research": 0, "researchg": 1, "resolv": 1, "respect": [0, 1], "ribosom": 0, "rigor": 0, "risk": 0, "rna": [0, 2], "round": 1, "rrna": 0, "run": 2, "same": [0, 1], "sampl": [0, 2], "sb": 1, "screen": 2, "see": 1, "select": 0, "seq": [0, 2], "sequenc": [0, 2, 3], "sever": 0, "short": [0, 2], "sickl": 2, "signal": 1, "simpl": 1, "singl": 0, "site": 1, "situat": 0, "size": 0, "smaller": 0, "snp": 0, "so": 0, "solut": [0, 2], "some": 0, "sourc": [0, 1], "specif": 0, "specifi": 1, "spectrocopi": 0, "start": [0, 1], "stemp": 1, "step": [0, 1, 2], "subsequentlti": 1, "suitabl": 0, "synthes": 1, "synthesi": 1, "synthet": 0, "systemsbiologi": 1, "tag": 1, "tail": 0, "tapest": 0, "technologi": [2, 3], "templat": 1, "termin": 1, "text": 1, "thees": 2, "thei": 0, "them": [0, 2], "thermofisherscientif": 0, "thi": [0, 1, 2], "thu": [0, 1], "time": 1, "tissu": 0, "toni": 2, "tool": 2, "total": 0, "train": 2, "transcript": 0, "trimmomat": 2, "type": 0, "typic": 0, "understand": 2, "unwant": 0, "up": 0, "us": [0, 1], "uv": 0, "vari": 0, "variou": 1, "verifi": 1, "via": 0, "video": 1, "visual": 1, "wa": 1, "we": 0, "wg": 0, "when": [0, 1], "where": 1, "whether": 0, "which": [0, 1], "while": 1, "whilst": 0, "whole": 0, "work": [1, 2], "www": 1, "yexcr": 0, "your": 1}, "titles": ["1 Library preparation", "2 Main Sequencing Technologies", "About the course", "NGS Quality Control"], "titleterms": {"1": 0, "2": 1, "about": 2, "bia": 0, "caus": 0, "control": 3, "cours": 2, "data": 0, "date": 2, "end": 1, "illumina": 1, "instructor": 2, "learn": 2, "librari": 0, "locat": 2, "long": 1, "main": [0, 1, 2], "nanopor": 1, "ng": 3, "object": 2, "outlin": 2, "pair": 1, "poor": 0, "prepar": 0, "prerequisit": 2, "program": 2, "qualiti": [0, 3], "read": 1, "requir": 2, "sequenc": 1, "short": 1, "singl": 1, "technic": 2, "technologi": 1, "time": 2}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1 Library preparation": [[0, null]], "2 Main Sequencing Technologies": [[1, null]], "3 Quality Control and Preprocessing": [[2, null]], "About the course": [[3, null]], "DNA library bias": [[0, "dna-library-bias"]], "Dates, time, location": [[3, "dates-time-location"]], "FASTQC": [[2, "fastqc"]], "Illumina": [[2, "illumina"]], "Learning objectives": [[3, "learning-objectives"]], "Library preparation": [[0, "id1"]], "Library preparation bias": [[0, "library-preparation-bias"]], "Long read sequencing (Nanopore)": [[1, "long-read-sequencing-nanopore"]], "Main Causes of poor quality data": [[0, "main-causes-of-poor-quality-data"]], "Main instructors:": [[3, "id1"]], "NGS Quality Control": [[4, null]], "Outline": [[3, "outline"]], "Paired end": [[1, "paired-end"]], "Prerequisite / technical requirements": [[3, "prerequisite-technical-requirements"]], "Program": [[3, "program"]], "Quality Control": [[2, "quality-control"]], "RNA library bias": [[0, "rna-library-bias"]], "Short Reads sequencing (Illumina)": [[1, "short-reads-sequencing-illumina"]], "Single end": [[1, "single-end"]]}, "docnames": ["1- Library_preparation", "2- Sequencing_technologies", "3- Quality Control and Preprocessing", "about", "index"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2}, "filenames": ["1- Library_preparation.rst", "2- Sequencing_technologies.rst", "3- Quality Control and Preprocessing.rst", "about.rst", "index.rst"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1], "0": 1, "008": 0, "01": 0, "1": [1, 4], "10": [0, 1], "100": 1, "1016": 0, "1155": 0, "18": 0, "2": 4, "2014": 0, "2021": 0, "20end": 1, "20fragment": 1, "20of": 1, "20read": 1, "20singl": 1, "20the": 1, "20v": 1, "22": 0, "25": 2, "2c": 1, "2dend": 1, "3": 0, "357946568_new_approaches_and_concepts_to_study_complex_microbial_commun": 1, "41": [0, 1], "5": 0, "50": 0, "6647597": 0, "90": 0, "A": [0, 1], "AT": 0, "As": 0, "Being": 3, "For": [0, 1, 2], "In": [0, 1], "It": [0, 1], "On": [0, 1], "Or": 1, "Such": 1, "The": [0, 1, 2], "Their": 0, "These": 0, "To": 0, "With": 0, "abil": 1, "about": [1, 4], "abund": 0, "accord": 0, "acid": [0, 1], "act": 0, "actinomycind": 0, "activ": 1, "ad": 1, "adapt": [0, 1, 3], "add": 0, "addit": [0, 1], "addition": 0, "adenin": 0, "adjust": 0, "after": 1, "afterward": 0, "agaros": 0, "agent": 0, "aggreg": 3, "agil": 0, "algorithm": 1, "alien": 0, "all": 0, "allow": [0, 1], "alongsid": 1, "also": [0, 1], "altogeth": 0, "amid": 0, "amino": 0, "among": 0, "amount": 0, "ampli": 0, "amplif": [0, 1], "amplifi": [0, 1], "ampli\ufb01": 0, "ampli\ufb01c": 0, "ampli\ufb01ca": 0, "an": [0, 1], "analysi": [0, 2], "analyz": 0, "ani": 2, "anoth": 1, "antisens": 0, "appear": [0, 1], "appli": [0, 1], "applic": 0, "approach": 0, "appropi": 0, "ar": [0, 1, 2], "aris": 2, "arrai": 1, "artifact": 0, "ascii": 1, "ass": 0, "assembli": 1, "assesss": 0, "attach": [0, 1], "author": 0, "averag": 0, "avoid": [0, 3], "axi": 2, "balanc": 0, "bam": 2, "barcod": 0, "base": [0, 1, 2, 3], "basecal": 1, "basi": 1, "basic": 2, "bead": 0, "becaus": [0, 1, 2], "becom": 0, "been": 0, "befor": [0, 1], "being": [0, 1], "best": 0, "betain": 0, "between": 0, "bia": 3, "bias": 0, "bind": 1, "bioinformat": 0, "blue": 2, "blunt": 0, "both": [0, 1], "boundari": 0, "boxwhisk": 2, "bp": 1, "bridg": [0, 1], "broken": 0, "buffer": 0, "calcul": 1, "call": 1, "can": [0, 1], "capabl": 0, "captur": 0, "casava": 2, "case": [0, 1], "cdna": 0, "cell": [0, 1], "chain": 0, "chang": 1, "channel": 1, "chaotrop": 0, "charact": 1, "characterist": 1, "charg": 1, "check": [0, 2], "chemic": [0, 1], "chip": [0, 1], "chipseq": 0, "chlorid": 0, "choic": 0, "chosen": 0, "chromatin": 0, "ci": 1, "cleotid": 0, "cli": 3, "cluster": 1, "collect": 0, "colorspac": 2, "colour": 1, "columbia": 1, "comfort": 3, "command": 3, "commonli": 2, "compar": 0, "compat": 0, "complet": 0, "complex": 0, "composit": 0, "compound": 0, "compress": 2, "compris": 0, "comput": 1, "concentr": 0, "concept": 1, "confid": 1, "connect": 1, "consider": 0, "consist": 1, "constant": 1, "construct": 0, "consum": 1, "contain": [0, 1, 2], "contamin": 0, "content": [0, 2, 4], "contrast": 0, "control": [0, 1, 3], "convert": 0, "coordin": 1, "correspond": 1, "could": [0, 1, 2], "count": 1, "coupl": 1, "cours": 4, "coverag": 0, "cozzuto": 3, "cr": 0, "creat": 0, "critic": 0, "current": [0, 1], "custom": 0, "cutadapt": 3, "cycl": [0, 1], "cyle": 0, "data": [1, 2, 3], "de": 0, "decod": 1, "decreas": [0, 1, 2], "defin": [0, 1], "degener": 0, "degrad": 0, "delet": 1, "depend": 0, "deplet": 0, "desir": 0, "detect": 0, "determin": [1, 2], "develop": [0, 2], "deviat": 0, "differ": [0, 3], "dimethyl": 0, "direct": 1, "directli": 0, "displac": 0, "displai": 2, "disrupt": 1, "distribut": [0, 2], "dithiothreitol": 0, "divers": 0, "divid": [1, 2], "di\ufb00erenti": 0, "dmso": 0, "dna": [1, 3], "dnasynthesi": 0, "doi": 0, "done": 0, "doubl": [0, 1], "down": 0, "downstream": 0, "driven": 1, "dt": 0, "dtt": 0, "due": [0, 1], "duplex": 1, "duplic": [0, 2], "duplicationr": 0, "dure": [0, 1, 2], "dx": 0, "each": [0, 1, 2, 3], "earlier": 0, "edu": 1, "effcient": 1, "effici": [0, 1], "either": [0, 2], "electr": 1, "electro": 1, "electrod": 1, "electrolyt": 1, "electrophoresi": 0, "element": 0, "elimin": 0, "embed": 1, "empti": 1, "enabl": [0, 1], "enbal": 0, "end": [0, 2], "endonucleas": 0, "enhanc": 0, "enrich": 0, "ensur": 0, "environ": 3, "enzimat": 0, "enzym": 0, "equal": 1, "error": 0, "especif": 0, "essenti": 0, "euchromatin": 0, "evad": 0, "even": 0, "execut": 3, "exonucleas": 0, "expens": [0, 1], "experi": 1, "exponenti": 0, "extens": 2, "extract": 0, "extrem": 0, "e\ufb00ec": 0, "e\ufb00ect": 0, "fact": 0, "failur": 2, "fals": 0, "fare": 0, "fasq": 3, "fastp": 3, "fastq": [1, 2, 3], "fastqc": 3, "femtogram": 0, "few": 0, "field": 0, "file": 2, "finish": 1, "flow": [0, 1], "flowcel": [0, 1], "fluoresc": 1, "fluoromet": 0, "follow": 2, "form": 0, "formamid": 0, "format": [1, 3], "formo": 0, "four": 1, "fragment": [0, 1], "free": 0, "freez": 0, "frequent": 0, "from": [0, 1], "further": [0, 2], "gc": [0, 2], "gel": 0, "gener": [0, 1, 2], "genom": [0, 1], "get": 0, "git": 3, "github": 4, "good": 0, "group": [0, 1], "gurante": 0, "gz": 2, "gzip": 2, "h": 0, "ha": [0, 1], "had": 0, "hand": 1, "have": 0, "heat": 0, "helicas": 1, "help": 0, "here": 0, "hermoso": 3, "heterochromatin": 0, "hifi": 0, "high": 0, "higher": 1, "hinder": 0, "histon": 0, "hole": 1, "how": [2, 3], "howev": 0, "html": 2, "http": [0, 1], "hybrid": 0, "hybridis": 1, "hypothesi": 0, "i": [0, 1, 2], "id": 1, "identifi": [0, 1], "illumina": [0, 3], "illumina_sequencing_by_synthesis_workflow": 1, "imag": 1, "implement": 0, "implic": 0, "import": 0, "improv": [0, 1], "includ": 0, "incomplet": 0, "incorpor": 1, "incorrect": 1, "increas": 0, "index": 0, "indic": 0, "inform": [0, 1, 2], "inhibit": 0, "input": 0, "insert": 1, "instrument": [0, 1], "integr": 0, "interact": 0, "interfac": 3, "interpret": 3, "introduc": [0, 3], "introduct": 0, "introductori": 3, "invers": 1, "involv": 0, "ionic": 1, "isol": 0, "its": [0, 1, 2], "j": [0, 1], "julia": 3, "kapa": 0, "kit": 0, "kmer": 2, "know": 3, "lack": 0, "lane": 1, "larg": 0, "later": 0, "lead": 0, "least": 0, "leav": 0, "leght": 2, "lenght": [0, 1, 2], "length": 1, "less": [0, 2], "librari": [1, 2, 3, 4], "ligas": 0, "ligat": 0, "like": [0, 1], "limit": 0, "line": [1, 2, 3], "linux": 3, "lncrna": 0, "log10": 1, "long": 3, "lost": 0, "low": [0, 3], "lower": 0, "luca": 3, "m": 0, "machin": 2, "magnet": 0, "mai": [0, 1, 3], "main": [2, 4], "make": [0, 2], "mani": 0, "manner": 1, "map": 2, "marker": 1, "materi": 0, "mda": 0, "mean": [0, 2], "measur": [0, 1], "median": 2, "mediat": 1, "melt": 0, "membran": 1, "mercaptoethanol": 0, "met": 0, "metadata": 1, "method": 0, "mg": 0, "microgram": 0, "mimic": 0, "minim": 0, "mirna": 0, "mistak": 2, "mitig": 0, "mixtur": 0, "molecul": [0, 1], "more": [0, 1, 2], "most": [0, 1], "motor": 1, "mrna": 0, "much": 1, "multipl": [0, 1], "multiplex": 0, "multiqc": 3, "must": 0, "n": 2, "nanodrop": 0, "nanoplot": 3, "nanopor": 3, "nanopore_sequencing_workflow": 1, "natur": 0, "necessari": 0, "need": [0, 2], "neg": 1, "net": 1, "network": 1, "neural": 1, "neutral": 0, "nevertheless": [0, 1], "next": [0, 1], "ng": [0, 3], "non": 0, "normal": 2, "nucleic": [0, 1], "nucleotid": 1, "nugen": 0, "number": [0, 1, 2], "nutshel": 0, "o": 2, "obtain": [0, 1, 3], "offer": 3, "ofstart": 0, "often": 0, "ohter": 1, "oligo": 0, "oligonu": 0, "oligonucleotid": 0, "one": 1, "onli": [0, 2], "opposit": 1, "optic": 1, "option": 0, "order": 2, "org": 0, "other": 0, "our": 0, "out": 0, "overal": [0, 2], "overhang": 0, "overrepres": 2, "own": 1, "oxford": 1, "p": 1, "pair": 0, "paramet": 3, "particip": 3, "pass": 1, "paus": 1, "pcr": [0, 1], "per": 2, "perfom": 0, "perform": 0, "performedus": 0, "phosphoryl": 0, "phred": 1, "physic": 0, "plai": 0, "platform": 0, "poli": 0, "polya": 0, "polyadenil": 0, "polyadenyl": 0, "polymeras": [0, 1, 2], "ponomarenko": 3, "pool": 0, "popul": 0, "popular": 0, "pore": 1, "posit": [1, 2], "possibl": 0, "power": 0, "predict": 1, "prefer": 0, "prepar": [1, 2, 3, 4], "preprocess": 3, "presenc": 0, "present": [0, 1], "preserv": 0, "primer": [0, 1], "probabl": 1, "probe": 0, "problem": 0, "proceess": 0, "process": [0, 1], "produc": [0, 1, 2], "product": 0, "progress": [1, 2], "propag": 0, "proper": 0, "propos": 0, "protein": 1, "protocol": 0, "public": [0, 1], "purif": 0, "puriti": 0, "qc": [2, 3], "qualiti": [1, 3], "qualityrna": 0, "quantif": 0, "quantiti": 0, "qubit": 0, "random": 0, "randomnucleotid": 0, "ratchet": 1, "raw": [1, 3], "reaction": 0, "read": [0, 2, 3], "readili": 0, "real": 1, "rearrang": 1, "reason": 0, "recognis": 0, "recommend": [0, 3], "record": 1, "red": 2, "reduc": [0, 1], "reducingcompound": 0, "region": [0, 1], "rel": [0, 1], "relat": 2, "remark": 0, "remov": [0, 1, 3], "reolic": 0, "repair": 0, "repetit": 1, "replac": 0, "report": [0, 2, 3], "repositori": 4, "repres": [1, 2], "represent": 0, "requir": 0, "research": 0, "researchg": 1, "resist": 1, "resolv": 1, "respect": [0, 1], "restrict": 0, "result": 0, "revers": 0, "ribosom": 0, "rich": 0, "rigor": 0, "risk": 0, "rna": [1, 3], "rnase": 0, "rnasequ": 0, "role": 0, "room": 0, "round": 1, "rrna": 0, "run": [1, 3], "salt": 0, "sam": 2, "same": [0, 1], "sampl": [0, 3], "sb": 1, "scheme": 0, "score": [1, 2], "screen": [2, 3], "second": 0, "secondari": 0, "section": 2, "see": [0, 1], "select": 0, "sens": 1, "sensor": 1, "seq": [0, 3], "sequenc": [0, 2, 3, 4], "set": 0, "sever": 0, "shear": 0, "short": [0, 3], "show": [0, 2], "shown": 0, "sickl": 3, "side": 1, "signal": 1, "signi\ufb01c": 0, "signi\ufb01cantli": 0, "similarli": 0, "simpl": [0, 1], "singl": 0, "site": 1, "situat": 0, "size": 0, "slice": 0, "slightli": 0, "small": 0, "smaller": 0, "smart": 0, "snp": 0, "so": 0, "solut": [0, 1, 3], "solv": 0, "some": [0, 1], "sonic": 0, "sourc": [0, 1], "specif": 0, "specifi": 1, "speci\ufb01c": 0, "spectrocopi": 0, "speed": 1, "squiggl": 1, "stabl": 0, "standard": 0, "start": [0, 1], "statist": 2, "stemp": 1, "step": [0, 1, 3], "store": 1, "strand": [0, 1], "strategi": 0, "structur": 0, "studi": 0, "subsequentlti": 1, "substrat": 0, "subtract": 0, "suitabl": 0, "sulfoxid": 0, "suppli": 0, "support": 2, "surfac": 0, "symbol": 1, "synthes": 1, "synthesi": 1, "synthet": 0, "systemsbiologi": 1, "t": 0, "t4": 0, "tag": 1, "tail": 0, "tapest": 0, "target": 0, "technologi": [3, 4], "temperatur": 0, "templat": [0, 1], "tend": [0, 2], "termin": 1, "tetramethylammonium": 0, "text": 1, "than": [0, 2], "thaw": 0, "thees": 3, "thei": 0, "them": [0, 3], "therefor": 0, "thermofisherscientif": 0, "thi": [0, 1, 3], "through": [0, 1], "throughdna": 0, "throughout": 2, "thu": [0, 1], "tile": 1, "time": 1, "tini": 1, "tion": 0, "tip": 0, "tissu": 0, "tive": 0, "tmac": 0, "tone": 0, "toni": 3, "tool": [0, 2, 3], "total": 0, "train": 3, "tran": 1, "transcript": 0, "transcriptas": 0, "translat": 1, "transloc": 1, "treatment": 0, "trimmomat": 3, "trizol": 0, "type": 0, "typic": 0, "underrepres": 0, "understand": 3, "uniform": 0, "unwant": 0, "unwound": 1, "up": 0, "us": [0, 1, 2], "uv": 0, "vari": [0, 2], "variou": 1, "verifi": 1, "via": 0, "video": 1, "visual": 1, "voltag": 1, "wa": 1, "warn": 2, "we": [0, 1], "weigh": 0, "wg": 0, "wga": 0, "when": [0, 1], "where": 1, "whether": 0, "which": [0, 1], "while": 1, "whilst": 0, "whole": 0, "wide": 0, "wise": 1, "without": 0, "work": [1, 3], "would": 0, "www": 1, "x": [1, 2], "y": [1, 2], "yexcr": 0, "your": 1, "\u00bac": 0, "\u03b2": 0, "\ufb01cation": 0, "\ufb02owcel": 0}, "titles": ["1 Library preparation", "2 Main Sequencing Technologies", "3 Quality Control and Preprocessing", "About the course", "NGS Quality Control"], "titleterms": {"1": 0, "2": 1, "3": 2, "about": 3, "bia": 0, "caus": 0, "control": [2, 4], "cours": 3, "data": 0, "date": 3, "dna": 0, "end": 1, "fastqc": 2, "illumina": [1, 2], "instructor": 3, "learn": 3, "librari": 0, "locat": 3, "long": 1, "main": [0, 1, 3], "nanopor": 1, "ng": 4, "object": 3, "outlin": 3, "pair": 1, "poor": 0, "prepar": 0, "preprocess": 2, "prerequisit": 3, "program": 3, "qualiti": [0, 2, 4], "read": 1, "requir": 3, "rna": 0, "sequenc": 1, "short": 1, "singl": 1, "technic": 3, "technologi": 1, "time": 3}})
\ No newline at end of file