What is Isoworm

IsoWorm, is a Snakemake pipeline developed to quantify isoforms expression levels in large RNA-seq datasets (paired-end short-reads). The pipeline consists of a series of interconnected modules that perform various stages of data analysis. It starts with a txt file containing SRA IDs, while the indications about the RNAseq library type, or a BAM file, and the references files (FASTA and GTF) are in the snakemake config file. The custom module of IsoWorm could be used to specifically analyse isoforms (in our case study, BRAF), using custom gtf files to quantify isoform-specific genomic regions. The quantification is made through Stringtie and all the plots are generated with R language. Conversely, the Salmon module of IsoWorm was used to quantify all the isoforms annotated in Ensembl db (our reference). An R script generates pie charts for genen isoform expression. IsoWorm implent also a module for single-end reads tp identifies polyA sites using custom R scripts, starting form Quant Seq 3' REV sequencing data.

Getting Started

Input

The input files and parameters are specified in config_final.yml, and for R plots and script in config file for R:

top level directories

workflow_type: "" - options: "polyA_module", "salmon_module", "custom_module", "custom_and_salmon_modules"
sourcedir: - your output directory
refdir: - your gtf fasta and all reference files directory
sampledir: - your txt samples files directory
envsdir: - your envs files directory
workflow: - your workflow (.smk) files directory
samples: - your txt file containig the sra samples here!

reference files, genome indices and data

stargenomedir, GRCh38.primary_assembly.genome: - directory for STAR genome
fasta: GRCh38.primary_assembly.genome: - genome fasta reference file for STAR
fasta_salmon: GRCh38.primary_assembly.genome: - transcript fasta reference for salmon
gtf: GRCh38.primary_assembly.genome: - gtf file for all transcripts
gtf_personal: GRCh38.primary_assembly.genome: - gtf file customize for your transcript of interest

Output

polyA modules

SAindex - star index
{sample_name}_SE_small_Aligned.sortedByCoord.out.bam - sliced bam of you gene of interest (BRAF in our case study), single end
polyA_filtered_3UTR204.csv - peaks for poly A in BRAF-204 UTRs
polyA_filtered_3UTR220.csv - peaks for poly A in BRAF-220 UTRs

salmon modules

salmon_index - salmon index
quant.sf - all transcript quantified by salmon
ratio_salmon.pdf - box plots ratio between our two isoforms of interest
pie_charts.pdf - pie charts expressions values of all our isoforms of interest
total_salmon.pdf - total expression levels of our gene of interest

custom modules

SAindex - star index
{file}_small_Aligned.sortedByCoord.out.bam - sliced bam of you gene of interest (BRAF in our case study)
ratio_BRAF.pdf - box plots ratio between our two isoforms of interest

Dependencies

miniconda - install it according to the instructions.
snakemake install using conda.
The rest of the dependencies are automatically installed using the conda feature of snakemake.

Installation

Clone the repository:

git clone https://github.com/ctglab/isoworm

Usage

Edit config.yml to set the input datasets and parameters, edit config.R to set the input datasets and parameters for R and edit script.sh with the directory where you want to download your fastqs, then issue:

snakemake -s snakefile_final.smk --use-conda --rerun-incomplete --core 2 -k

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
config		config
results		results
test_data		test_data
workflow		workflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Isoworm

Getting Started

Input

top level directories

reference files, genome indices and data

Output

polyA modules

salmon modules

custom modules

Dependencies

Installation

Usage

About

Contributors 2

Languages

ctglab/isoworm

Folders and files

Latest commit

History

Repository files navigation

What is Isoworm

Getting Started

Input

top level directories

reference files, genome indices and data

Output

polyA modules

salmon modules

custom modules

Dependencies

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages