From 8f1835ddbc1657891841daad3be31d5f4f99475b Mon Sep 17 00:00:00 2001 From: Fabian Andrade Date: Tue, 3 Sep 2024 17:55:33 +0200 Subject: [PATCH] 1 Template preparation added --- docs/1- Library preparation.rst | 22 +++++++++++++++++++ docs/2- Sequencing technologies.rst | 2 +- docs/3- Quality Control and Preprocessing.rst | 15 +++++-------- docs/4- Quality of the mapping.rst | 4 ++++ 4 files changed, 32 insertions(+), 11 deletions(-) diff --git a/docs/1- Library preparation.rst b/docs/1- Library preparation.rst index f051723..2e2faf1 100644 --- a/docs/1- Library preparation.rst +++ b/docs/1- Library preparation.rst @@ -15,6 +15,28 @@ DNA Typically could be measured either UV spectrocopy (Nanodrop) or electrophore .. danger:: **RNA is more critical**, sample degradation and contamination is more frequent. +Template Preparation +-------------------- + +*Source: https://doi.org/10.1016/j.humimm.2021.02.012* + +.. tabs:: + + .. tab:: Amplicons + + Thus, LR-PCR improved issues of sequence ambiguities seen with short amplicon sequencing It should be noted, however, that the LR-PCR-based approach, especially for HLA genotyping, is occasionally characterized by allele dropouts. + + .. tab:: WES + + Hibridization capture-based template is the most common. Biotinylated probes are hybridized with regions of interest, which are then isolated using streptavidin-coated magnetic beads. + + .. tab:: Epigenome Sequencing + + Preparation of genomic samples for WGBS is commonly performed through the post-bisulfite treatment of DNA and de-tagging before index adaptor ligation for NGS sequencing . ChIP-Seq allows for genome-wide mapping of DNA-binding proteins and histone modifications at base-pair resolution. To prepare samples for ChIP-Seq, formaldehyde-fixed or natural chromatin is fragmented by micrococcal nuclease (MNase) or sonication, which is further immunoprecipitated with target-specific antibody conjugated to magnetic beads. Isolated DNA from the precipitated protein-DNA complexes is used to generate libraries + + + + Library preparation ======================== diff --git a/docs/2- Sequencing technologies.rst b/docs/2- Sequencing technologies.rst index 50b2839..953580d 100644 --- a/docs/2- Sequencing technologies.rst +++ b/docs/2- Sequencing technologies.rst @@ -121,7 +121,7 @@ For each read, the information it's divided in four lines: The quality scores range from 0 to 41, but the values are in ASCII (base 33) encoding in order to reduce file size. In older versions of the FASTQ format, the quality scores were encoded in ASCII (base 64) encoding. The higher the quality score, the lower the probability of an incorrect base call. - .. image:: images/phred_quality_score.png + .. image:: images/phred_scores.png :width: 400 :align: center diff --git a/docs/3- Quality Control and Preprocessing.rst b/docs/3- Quality Control and Preprocessing.rst index 3cffce3..9178dbc 100644 --- a/docs/3- Quality Control and Preprocessing.rst +++ b/docs/3- Quality Control and Preprocessing.rst @@ -136,14 +136,14 @@ Example of a FASTQ-Screen report: .. image:: images/FASTQ-Screen/Mapping_results_tables.png :width: 400 :align: center - :alt: *Adapter Content FASTQC module* + :alt: *FASTQ-Screen table report* - Mapping results tables values in a plot. - .. image:: images/FASTQ-Screen/Mapping_results_plots.png + .. image:: images/FASTQ-Screen/Mapping_results_graphics.png :width: 400 :align: center - :alt: *Adapter Content FASTQC module* + :alt: *FASTQ-Screen plot report* When working with several samples and reports, FASTQC and FASTQ-Screen reports could be aggregate in a unique report using "MULTIQC"" (https://multiqc.info/) @@ -189,20 +189,15 @@ Example of fastp report. - Adapters: Sequence of the adapters found in the reads and the number of reads that contain them. - .. image:: images/fastp_report/adapters.png - :width: 400 - :align: center - :alt: *Adapters fastp report* - - Insert size estimation: Distribution of the insert size of the reads. Insert size correpond to the size fragment overlapped by the paired-end reads, is the fragment of DNA that is sequenced and has - .. image:: images/fastp_report/insert_size_estimation.png + .. image:: images/fastp_report/insert_size_explanation.png :width: 400 :align: center :alt: * insert size estimation* - *source: https://doi.org/10.3389%2Ffgene.2014.00005* + *source: https://doi.org/10.3389%2Ffgene.2014.00005* - Quality per base, base contents and kmer counting before and after filtering diff --git a/docs/4- Quality of the mapping.rst b/docs/4- Quality of the mapping.rst index f85edf9..a8e1655 100644 --- a/docs/4- Quality of the mapping.rst +++ b/docs/4- Quality of the mapping.rst @@ -14,17 +14,21 @@ Depending on the origin of our sequencing data (WGS, WES, RNA-seq, Chip-seq, ... Both can be used for WGS or WES data: - BWA-MEM: by default perform local aligment, high accuracy and efficiency in align reads to the entire genome. Because its very efficent for finding aligment with gaps, very important for variant detection . + - bowtie2: by default perform global aligment, is faster than BWA but less sensitive. recommended for large-scale sequencing and frequently used for ChiP-seq due to its speed to align shorter reads and identified enriched regions (peak detection) . **RNA-seq splice-aware aligner**: Specialized in the mapping of RNA-seq reads, that can be spliced and map to different exons of the same gene: - STAR: Most popular aligner for RNA-seq data, very effcient and accurate identifying splice junctions . + - TopHat2: first aligners designed for RNA-seq data, but now is deprecated and replaced by STAR . + - HISAT2: built on the Bowtiw2 aligment algorithm, but optimized for RNA-seq data . **Pseudo-Aligner - Quasi-mapping**: very fast, map to transciptome and does quantitation. Can't find novel transcripts. When the goal is quantify gene expression levels, this is the best option: - Kallisto: use pseudo-aligment approach, efficiently determines the compatibility of the transcript without full sequence aligment, very fast and memory-efficient, better option for large-scale projects . + - Salmon: use quasi-mapping approach, similar to pseudo-aligment but includes information about the location of the read within the transcript, and perform bias correction steps, slower than kallisto but more accurate quantifications. better option for complex transcriptomes . Previous aligment of the reads, a reference genome in fasta format is needed, Typical sources to look up are UCSC, Ensembl or Gencode. An indexing of the reference genome is perfomed to create a dictionary database of the redundant sequences of the genome and facilitate and accelerate the query of the reads respect this regions, thus, minimizing the the memory footprint.