Integrated High-throughput Sequencing Data Analysis for Plant
Author: Dr. Chenjiang You
Current Version: 0.8
Latest updata: 02/09/2022
The pRNASeqTools
is a Perl
and R
based pipeline designed for automatic general analysis for Illumina sequencing data in supported plants.
Currently it is able to process small RNA-seq, mRNA-seq, degradome-seq, CLIP-seq, ChIP-seq, and WGBS-seq and generally analyze them. See below for more information of specific tasks.
The phasiRNA identification module was contributed by Dr. Xuan Ma.
If you have any questions or comments, please submit an issue in the GitHub or directly email to Chenjiang You.
To successfully run this pipeline on your own computer or server, several pieces of dependent software are needed. See INSTALL.md
for detailed information.
For genome reference files, please contact Chenjiang You for pre-built genomes or the instructions for new genomes.
The only input files needed are Illumina output fastq files, either in the FASTQ
format or corresponding compressed file formats .gz
and .bz2
. SRR accessions are also accepted.
Note: the FASTA
format is not supported. You may convert the fasta format to fastq format by adding artificial sequence names and qualities.
See help information of pRNASeqTools
simply by execute pRNASeqTools
.
General analysis for small RNA-seq from samples control
and treatment
with 3 biological replicates
pRNASeqTools srna --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2
Only mapping the small RNA reads to the genome and creating read count files
pRNASeqTools srna --adaptor AGATCGGAAGAGC --mapping-only --control control=control_1.fastq+control_2.fastq+SRRXXXXXXX
Perform statistic analyses in the folder containing pre-processed data
pRNASeqTools srna --nomapping --control control=3 --treatment treatment=3
General analysis for mRNA-seq from samples control
and treatment
with 3 biological replicates
pRNASeqTools mrna --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2
General analysis for paired mRNA-seq from samples control
and treatment
with 3 biological replicates
pRNASeqTools mrna --control control=control_1_R1.fastq.gz,control_1_R2.fastq.gz+control_2_R1.fastq.gz,control_2_R2.fastq.gz+control_3_R1.fastq.gz,control_3_R2.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2
Trunction and tailing analysis of plant miRNAs
pRNASeqTools tt --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2
Degradome data analysis
pRNASeqTools deg --adaptor AGATCGGAAGAGC --control control=control_1.fastq.gz+control_2.fastq.gz+control_3.fastq.gz --treatment treatment=treatment_1.fastq.bz2+treatment_2.fastq.bz2+treatment_3.fastq.bz2
Two-factor DE analysis
pRNASeqTools tf --control control=time1,3,time2,3 --treatment treatment=time1,3,time2,3
All output files are stored in the output directory.
Mapping statistics are stored in the log file log_xxxxxxxxx.txt
.
Several groups of files are generated in the output directory:
-
count
files andnf
files can be used for later--nomapping
runs, which will not invoke the mapping procedures.The second to tenth columns of
count
files are numbers of assigned small RNAs with length 18 - 26nt. -
pdf
files showing the reproductivity of biological replicates and the relationship of samples. -
csv
files containing the results of statistic analyses, of whichhyper
andhypo
files indicate the significant ones filtered out based on input parameters. -
bedgraph
files for visualization in IGV. Note: Keywords are embedded in the file names, indicating the targets and methods.
- miRNA reads are categorized in the
out
files. The second column shows the number of tailed nucleotides and the third column shows the number of truncated nucleotides. pdf
files are bubble plots for each miRNA.
bam
files contain the mapped reads.txt
files report the identified peaks on each transcripts.
- Mapped reads in
bam
files and read counts for each gene intxt
files are reported. - Up-regulated and down-regulated DEG results are reported in
total.hyper.csv
andtotal.hypo.csv
files.
- This mode can only run in
srna
andmrna
output folders. - Up-regulated and down-regulated DEG results are reported in
total.hyper.csv
andtotal.hypo.csv
files.