Skip to content

Commit

Permalink
update index 2 and 3.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
FabianAndradeLozano committed Aug 22, 2024
1 parent 425843e commit 0603d28
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 10 deletions.
9 changes: 4 additions & 5 deletions docs/2- Sequencing_technologies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,10 @@ A basecaller translates raw signals into DNA sequence data (FASTQ). The basecall
.. seealso::
.. _Nanopore_sequencing_workflow: https://www.youtube.com/watch?v=RcP85JHLmnI

See the Nanopore_sequencing_workflow_ video by Oxford Nanopore Technologies to visualize the concepts of Nanopore sequencing.


FASTQ format and Phred quality score
See the Nanopore_sequencing_workflow_ video by Oxford Nanopore Technologies to visualize the concepts of Nanopore sequencing.
FASTQ format and Phred quality score
=====================================

The raw data generated by the sequencer is stored in FASTQ format, which contains the sequence of nucleotides and their corresponding quality scores.
Expand All @@ -103,4 +103,3 @@ It it's divided in four lines:
:width: 400
:align: center


16 changes: 11 additions & 5 deletions docs/3- Quality Control and Preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,18 @@ The html report generated for each file its divided in the following sections:
#. **Basic Statistics**: display the information related with the file, number and leght of the sequences, and overall %GC.
#. **Per base sequence quality**: shows how the quality score (y axis) varys throughout the sequence reads (x axis). For each position a BoxWhisker is displayed, the red line represents the median and the blue the mean. Commonly the quality score tend to decrease at the end of the reads, because the polymerase tends to make more mistakes as the read progresses.
is the median os any base is less than 25 a warning will arise.
#. **Per sequence quality score**:
#. Per base sequence content
#. Per base GC content
#. Per sequence GC content
#. **Per tile sequence quality**: shows the quality score distribution for each tile in the flowcell.
#. **Per sequence quality score**: shows the distribution of the quality scores for all the reads in the file. If a huge amount of reads subset have a poor average quality this could indicate a systematic problem.
#. **Per base sequence content**: proportion of each base position for the four nucleotides. A strong bias in the nucleotide composition could indicate a problem in the library preparation.
#. **Per sequence GC content**: GC content distribution for all the reads in the file, and compared to a modelled normal distribution of human GC content.
.. danger::
If the GC content is not close to the normal distribution, this could indicate a contamination or a problem in the library preparation.
Also, depending on the organism the GC content could vary, so it is important to know the GC content of the organism of interest (so avoid comparison with reference curve).
#. Per Base N content
#. Sequence Lenght Distribution
#. Duplicate Sequences
#. Overrepresented sequences
#. Overrepresented kmers
#. Overrepresented kmers

hands on:
********
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Contents:
about
1- Library_preparation
2- Sequencing_technologies
3- Quality Control and Preprocessing

0 comments on commit 0603d28

Please sign in to comment.