Skip to content

Commit

Permalink
1 Template preparation added
Browse files Browse the repository at this point in the history
  • Loading branch information
FabianAndradeLozano committed Sep 3, 2024
1 parent 707757d commit 8f1835d
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 11 deletions.
22 changes: 22 additions & 0 deletions docs/1- Library preparation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,28 @@ DNA Typically could be measured either UV spectrocopy (Nanodrop) or electrophore
.. danger::
**RNA is more critical**, sample degradation and contamination is more frequent.

Template Preparation
--------------------

*Source: https://doi.org/10.1016/j.humimm.2021.02.012*

.. tabs::

.. tab:: Amplicons

Thus, LR-PCR improved issues of sequence ambiguities seen with short amplicon sequencing It should be noted, however, that the LR-PCR-based approach, especially for HLA genotyping, is occasionally characterized by allele dropouts.

.. tab:: WES

Hibridization capture-based template is the most common. Biotinylated probes are hybridized with regions of interest, which are then isolated using streptavidin-coated magnetic beads.

.. tab:: Epigenome Sequencing

Preparation of genomic samples for WGBS is commonly performed through the post-bisulfite treatment of DNA and de-tagging before index adaptor ligation for NGS sequencing . ChIP-Seq allows for genome-wide mapping of DNA-binding proteins and histone modifications at base-pair resolution. To prepare samples for ChIP-Seq, formaldehyde-fixed or natural chromatin is fragmented by micrococcal nuclease (MNase) or sonication, which is further immunoprecipitated with target-specific antibody conjugated to magnetic beads. Isolated DNA from the precipitated protein-DNA complexes is used to generate libraries





Library preparation
========================
Expand Down
2 changes: 1 addition & 1 deletion docs/2- Sequencing technologies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ For each read, the information it's divided in four lines:
The quality scores range from 0 to 41, but the values are in ASCII (base 33) encoding in order to reduce file size. In older versions of the FASTQ format, the quality scores were encoded in ASCII (base 64) encoding.
The higher the quality score, the lower the probability of an incorrect base call.

.. image:: images/phred_quality_score.png
.. image:: images/phred_scores.png
:width: 400
:align: center

Expand Down
15 changes: 5 additions & 10 deletions docs/3- Quality Control and Preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,14 @@ Example of a FASTQ-Screen report:
.. image:: images/FASTQ-Screen/Mapping_results_tables.png
:width: 400
:align: center
:alt: *Adapter Content FASTQC module*
:alt: *FASTQ-Screen table report*

- Mapping results tables values in a plot.

.. image:: images/FASTQ-Screen/Mapping_results_plots.png
.. image:: images/FASTQ-Screen/Mapping_results_graphics.png
:width: 400
:align: center
:alt: *Adapter Content FASTQC module*
:alt: *FASTQ-Screen plot report*

When working with several samples and reports, FASTQC and FASTQ-Screen reports could be aggregate in a unique report using "MULTIQC"" (https://multiqc.info/)

Expand Down Expand Up @@ -189,20 +189,15 @@ Example of fastp report.

- Adapters: Sequence of the adapters found in the reads and the number of reads that contain them.

.. image:: images/fastp_report/adapters.png
:width: 400
:align: center
:alt: *Adapters fastp report*

- Insert size estimation: Distribution of the insert size of the reads. Insert size correpond to the size fragment overlapped by the paired-end reads, is the fragment of DNA that is sequenced and has

.. image:: images/fastp_report/insert_size_estimation.png
.. image:: images/fastp_report/insert_size_explanation.png
:width: 400
:align: center
:alt: * insert size estimation*


*source: https://doi.org/10.3389%2Ffgene.2014.00005*
*source: https://doi.org/10.3389%2Ffgene.2014.00005*

- Quality per base, base contents and kmer counting before and after filtering

Expand Down
4 changes: 4 additions & 0 deletions docs/4- Quality of the mapping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,21 @@ Depending on the origin of our sequencing data (WGS, WES, RNA-seq, Chip-seq, ...
Both can be used for WGS or WES data:

- BWA-MEM: by default perform local aligment, high accuracy and efficiency in align reads to the entire genome. Because its very efficent for finding aligment with gaps, very important for variant detection <https://bio-bwa.sourceforge.net/bwa.shtml>.

- bowtie2: by default perform global aligment, is faster than BWA but less sensitive. recommended for large-scale sequencing and frequently used for ChiP-seq due to its speed to align shorter reads and identified enriched regions (peak detection) <https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml>.

**RNA-seq splice-aware aligner**: Specialized in the mapping of RNA-seq reads, that can be spliced and map to different exons of the same gene:

- STAR: Most popular aligner for RNA-seq data, very effcient and accurate identifying splice junctions <https://github.com/alexdobin/STAR>.

- TopHat2: first aligners designed for RNA-seq data, but now is deprecated and replaced by STAR <https://ccb.jhu.edu/software/tophat/index.shtml>.

- HISAT2: built on the Bowtiw2 aligment algorithm, but optimized for RNA-seq data <https://daehwankimlab.github.io/hisat2/>.

**Pseudo-Aligner - Quasi-mapping**: very fast, map to transciptome and does quantitation. Can't find novel transcripts. When the goal is quantify gene expression levels, this is the best option:

- Kallisto: use pseudo-aligment approach, efficiently determines the compatibility of the transcript without full sequence aligment, very fast and memory-efficient, better option for large-scale projects <https://github.com/pachterlab/kallisto>.

- Salmon: use quasi-mapping approach, similar to pseudo-aligment but includes information about the location of the read within the transcript, and perform bias correction steps, slower than kallisto but more accurate quantifications. better option for complex transcriptomes <https://combine-lab.github.io/salmon/getting_started/>.

Previous aligment of the reads, a reference genome in fasta format is needed, Typical sources to look up are UCSC, Ensembl or Gencode. An indexing of the reference genome is perfomed to create a dictionary database of the redundant sequences of the genome and facilitate and accelerate the query of the reads respect this regions, thus, minimizing the the memory footprint.
Expand Down

0 comments on commit 8f1835d

Please sign in to comment.