Note: Since this part spans only on one day, the tutorial could not been done thouroughly.
Do not worry: the main points will be stressed during the course and this tutorial is enough detailed to be done by yourself after. If you have any question (on any part of the tutorial or on RNA-seq more generally) you can contact us: lgreger [at] ebi.ac.uk and mitra [at] ebi.ac.uk
This tutorial will illustrate how to use standalone tools, together with R and Bioconductor for the analysis of RNA-seq data. Keep in mind that this is a rapidly evolving field and that this document is not intended as a review of the many tools available to perform each step; instead, we will cover one of the many existing workflows to analyse this type of data.
We will be working with a subset of a publicly available dataset from Drosophila melanogaster, which is available both in the Short Read archive (SRP001537 - raw data) and in Bioconductor (pasilla package - processed data). For more information about this dataset please refer to the original publication (Brooks et al. 2010).
The tools and R packages that we will be using during the practical are listed below (see Software requirements) and the necessary data files can be found here. After dowloading and uncompressing the tar.gz
file, you should have the following directory structure in your computer:
RNAseq
|-- reference # reference info (e.g. genome sequence and annotation)
`-- data
|-- raw # raw data: fastq files
|-- demultiplexing # multiplexed data !not used for this course
|-- mapped # mapped data: BAM files
`-- RData # R environment for each part with the different generated object
You can also browse the files online and download only the needed material from here
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.
- Dealing with raw data
- The FASTQ format
- Quality assessment (QA)
- Filtering FASTQ files
- Aligning reads to the genome (already processed - will not be run)
- Dealing with aligned data
- The SAM/BAM format
- Visualising aligned reads (optional)
- Filtering BAM files
- Gene-centric analyses:
- Counting reads overlapping annotated genes
- With htseq-count
- With R
- Alternative approaches
- Normalising counts
- With RPKMs
- With DESeq2
- Differential gene expression
- Counting reads overlapping annotated genes
- Other topics - Not covered in the course
- Dealing with raw data
- Exon-centric analyses:
- Transcript-centric analyses:
Note: depending on the topics covered in the course some of these tools might not be used.
-
Standalone tools:
-
Bioconductor packages:
- GenomicRanges
- GenomicAlignments
- Rsamtools
- biomaRt
- pasilla
- DESeq - only for some dependencies
- DESeq2
- DEXSeq
- Course materials available at the Bioconductor website
- Online training resources at the EBI website
- R and Bioconductor tutorial by Thomas Girke
- Do not forget to check the documentation for the packages used in the practical!
This tutorial has been inspired on material developed by Mar Gonzalez-Porta, Ângela Gonçalves, Nicolas Delhomme, Simon Anders and Martin Morgan, who we would like to thank and acknowledge. Special thanks must go to Mar Gonzalez-Porta, with whom we have been teaching and to Gabriella Rustici for her priceless help in organising courses.