Skip to content

Annotation files

vrmelo edited this page Sep 30, 2020 · 1 revision

Annotation Files

SPLICE-q requires a genome annotation file provided by GENCODE or Ensembl in Gene Transfer Format (GTF) containing information on exons and the genes and transcripts they are associated with. SPLICE-q will use this file to locate and annotate introns and splice junctions from the exon coordinates.

What is the difference between GENCODE GTF and Ensembl GTF?

Examples of acceptable genome sequence files:

GENCODE:

  • Human v34
    ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.annotation.gtf.gz
  • Mouse M25
    ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz

Ensembl:

  • Human GRCh38 Release 100
    ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz
  • Mouse GRCm38 Release 100
    ftp://ftp.ensembl.org/pub/release-100/gtf/mus_musculus/Mus_musculus.GRCm38.100.gtf.gz
  • Yeast R64-1 Release 100
    ftp://ftp.ensembl.org/pub/release-100/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.100.gtf.gz
  • Other species
    ftp://ftp.ensembl.org/pub/release-100/gtf/

Attention! The most common alignment programs for mapping RNA-seq, such as STAR and HISAT2, include a step to generate genome indices (index) in which a GTF is required. We strongly recommend you to use this same GTF to run SPLICE-q.


Run the following to decompress gzip files from the command line:

 $ gunzip file.gz