Accompanying code for "Sites of transcription initiation drive mRNA isoform selection" by Alfonso-Gonzalez et al.
Given a set of dominant and non.dominant GFF files specifying genes in input/gff/
, this pipeline extracts MAFs for each gene and the corresponding FASTA files, and using prody computes the covariance at each nucleotide position across the multiple alignment.
A set of figures are output with optional boxes around the set of regions promoter regions specified in config/all_box_genes.csv
, as well as statistics determining the covariance and the likelihood that this covariance is different from the background.
Download MAFs from UCSC to input/mafs
.
Install snakemake
e.g. from bioconda using conda install -c bioconda snakemake
.
Edit run.sh to suit your computational resources, then execute run.sh.