Bulk RNA-Seq alignment + quantification pipeline using STAR for alignment and featureCounts-Rsubread for quantification
Sample tables which contain the SRA accessions and read info (paired/single end) with replicates separated by ";" were curated. This pipeline accepts such input and processes through the following steps:
- Downloads the accessions and writes the fastq files.
- Aligns the fastq files to the GRCh38 Homo Sapiens genome - Release 107 / Primary Assembly generated through STAR.
- Reports the counts extracted by featureCounts on Rsubread.
- The 26th column of the sampleTable (indexing starts from 1) should contain the SRA run accessions (SRRXXXXXX) replicates separated by ";"
- One column within the sampleTable should contain the read type info (factor with levels SINGLE or PAIRED)
-
Install mamba
-
Clone the repository
git clone https://github.com/zgr2788/AlignPip-crossBBR.git
-
Put the
sampleTable.csv
you would like to use within the main directory -
Run
make
and follow the steps -
Adjust settings through config.yaml
-
(Optional) Run
dag.sh
to get a directed acyclic graph (DAG) of the jobs -
Set up all cluster variables in
pip.sh
, delete allmodule load
statements fromModules/SRActions/Snakefile
&&Modules/Align/Snakefile
. This step is necessary as the pipeline was originally meant to be run on the TOSUN Cluster at Sabancı. -
Fastq files should be downloaded.gz format
Option 1
downloadTable{Layout}.sh
scripts are highly recommended if Aspera Connect is installed. With Aspera installed, do the following:
bash Modules/SRActions/downloadTable{Layout}.sh {path/to/runlist} {path/to/sshkey}
If Aspera fails for downloads, failed{layout.txt}
files will be generated for ease.
Option 2
The Modules/SRActions/fastqWrite.sh
file needs to be configured to give a conda environment with parallel-fastq-dump and also a local install of sra-toolkit. The reason being that the current conda install of parallel-fastq-dump does not install an updated sra-toolkit.
A total of 323 experiments were downloaded with this pipeline
156 PAIRED
15 SC (Single Fastq File)
152 SINGLE
To reproduce the alignments, use STAR version 2.7.0 and Rsubread version 2.8.2 with the sampleTable provided.