-
Notifications
You must be signed in to change notification settings - Fork 3
Annotation files
vrmelo edited this page Sep 30, 2020
·
1 revision
SPLICE-q requires a genome annotation file provided by GENCODE or Ensembl in Gene Transfer Format (GTF) containing information on exons and the genes and transcripts they are associated with. SPLICE-q will use this file to locate and annotate introns and splice junctions from the exon coordinates.
What is the difference between GENCODE GTF and Ensembl GTF?
Examples of acceptable genome sequence files:
GENCODE:
- Human v34
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.annotation.gtf.gz
- Mouse M25
ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz
Ensembl:
- Human GRCh38 Release 100
ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz
- Mouse GRCm38 Release 100
ftp://ftp.ensembl.org/pub/release-100/gtf/mus_musculus/Mus_musculus.GRCm38.100.gtf.gz
- Yeast R64-1 Release 100
ftp://ftp.ensembl.org/pub/release-100/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.100.gtf.gz
- Other species
ftp://ftp.ensembl.org/pub/release-100/gtf/
Attention! The most common alignment programs for mapping RNA-seq, such as STAR and HISAT2, include a step to generate genome indices (index) in which a GTF is required. We strongly recommend you to use this same GTF to run SPLICE-q.
Run the following to decompress gzip files from the command line:
$ gunzip file.gz