Alternative splicing can occur in at least 3/4th of human genes to encode two or more splice isoforms. These isoforms occur in different proportions over time, and between sexes. Thus, we present a method to characterize these isoforms, so to better understand gene regulation happening in normal and diseased states. Time2splice identifies temporal and sex-specific alternative splicing combinding multi-omic (i.e. both expression via RNA-seq, and protein-DNA interaction via CUT&RUN and ChIP-seq) data. Analysis is done in 3 parts. 1) Temporal expression analysis, 2) Temporal protein-DNA analysis, and 3) Temporal multi-omics integration.
NOTE: Snakemake pipeline coming soon for time2splice!
Ray*, M., Conard*, A. M., Urban, J., & Larschan, E. (2021). Sex-specific transcript diversity is regulated by a maternal pioneer factor in early Drosophila embryos. bioRxiv.
*contributed equally
- Clone repo
git clone
- In the time2splice directory, build conda environment
conda env create --name time2splice --file=time2splice.yml
- Install reference and other data files here. We suggest placing them in
- Preprocess
- Temporal expression analysis
- Temporal protein-DNA analysis
- Temporal multi-omics integration
Creates time2splice/
folder structure, as well as metadatafile.csv
and SraAccList.txt
(which is needed for next command to get .fastq files).
Retrieves .fastq files by passing in SraAccList.txt
from aforementioned step.
Runs FastQC for all .fastq files in a given directory.
Run Trim Galore! followed by FastQC to trim any reads below quality threshold.
Merges all the different lanes of the same flow cell .fastq files.
or preprocess/
or preprocess/
Run one or more of these three aligners (Bowtie2, BWA, or HISAT2) on .fastq data in a given directory.
Plot the alignments from either one or two different aligners (Bowtie2 or HISAT2).
Run salmon to quantify transcript expression for case and control samples.
e.g. ./ /nbu/compbio/aconard/larschan_data/sexed_embryo/ /data/compbio/aconard/splicing/results/salmon_results_ncbi_trans/ /data/compbio/aconard/BDGP6/transcriptome_dir/pub/infphilo/hisat2/data/bdgp6_tran/genome.fa 3 10 1 _001.fastq.gz
Run Suppa for differential splicing analysis of case and control samples.
e.g. ./ /data/compbio/aconard/splicing/results/salmon_results/ /data/compbio/aconard/splicing/results/suppa_results_ncbi_trans/ /data/compbio/aconard/BDGP6/transcriptome_dir/pub/infphilo/hisat2/data/bdgp6_tran/genome.fa 20
Converts NM_ gene names to flybase name, then merging outputs from run_suppa (NM_ gene names by 1 TPM value column for each replicate)
Identifies various forms of differential splicing (e.g. using PSI and DTU). NOTE: ensure that your column names have a treatment and control name differentiator of the following format: TREATMENT.REP and CONTROL.REP. Examples for 2 replicates and 0-2 timepoint clamp RNAi could be: 0-2clampNull.1, 0-2clampNull.2, 0-2rescue.1, 0-2rescue.2
Calculate and plot proportions of alternative splicing (in pie chart) in samples. <!> NOTE: Run this for every timepoint and condition separately.<!>
Find bias genes using control samples. <!> NOTE: Run this for every group pairs (2 only) comparison separately (e.g. females vs. males, females time 1 vs. females time 2, etc.) <!>
Plotting transcript abundance using PSI and DTU measures.
Plot alternative splicing genes within 2 categories (e.g. all females, all males, females sex specific, male sex specific, female all rest, male all rest, female non-sex specific, male non-sex specific, female new sex specific, male new sex specific). Each timepoint analysis should be run separately.
Run Picard's MarkDuplicates in for all .sorted.bam files in a given directory.
Run MACS2 to call peaks for all .sorted.bam files in a given directory.
Generate signal track using MACS2 to profile transcription factor modification enrichment levels genome-wide.
Note, there is no order to these scripts. Each analysis / results exploration is independent. More analysis scripts to come.
Run Intervene to view intersection of each narrowpeak file.
Perform gene ontology and gene set enrichment analysis given a list of genes.
Get coordinates of bed file and run through MEME.
Plot peak intensity for a given narrow peak file.
Perform chi-squared test on alternative splicing categories. Mutually Exclusive Exons (MXE) used in this example.