-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.txt
112 lines (69 loc) · 4.48 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
############################################################################################################
### BAQCOM - BAQCOM - Bioinformatics Analysis for Quality Control and Mapping #####
### Quality Control (Trimmomatic), Mapping (STAR | HISAT2) and Counting Reads (HTSeq | featuresCount) #####
############################################################################################################
The BAQCOM is a friendly-user pipeline which implements two automated pipelines for RNA-Seq analysis using Trimmomatic for QC and STAR for mapping the transcriptomes.
################
### INSTALATION
################
#STEP.1 - Install R and required libraries
# Acsess https://cran.r-project.org
# Install 'optparse' and 'parallel' packages
##################
### CONFIGURATION
##################
#STEP.2 - Download this repository to a preference path:
$ git clone https://github.com/hanielcedraz/BAQCOM.git
#STEP.3 - Run install.sh. This file will replace the trimmomatic path into the baqcomTrimmomatic and update ~/.bashrc or ~/.bash_profile directory path, so you can call the files from any directory.
$ chmod +x ./install.sh
$ ./install.sh
If you would like to use multiqc analysis, please install it.
Installation:
#If pip is not installed, please install as follow:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
#You can install MultiQC from PyPI using pip as follow:
pip install multiqc
More information, please acsess https://github.com/ewels/MultiQC
If you would like to use all power of the parallel, install pigz.
#############
### RUNNING
#############
#STEP.4 - Create a directory named 00-Fastq and move the .fastq.gz files into this directory:
$ mkdir 00-Fastq
#STEP.5 - Create samples.txt:
$ ./createSamples.sh
#This script will work perfectly if the file names in the 00-Fastq directory follow the structure:
# File R1: SAMPLENAME_any_necessary_information_R1_001.fastq.gz
# File R2: SAMPLENAME_any_necessary_information_R2_001.fastq.gz
or to single-end files
# File R1: SAMPLENAME_any_necessary_information_SE_001.fastq.gz
# If the files are splited in more than one R1 or R2 will be necessary to combine the equal R1 and R2 files.
#you may follow this command:
gunzip -c raw_fastq/*R1_001.fastq.gz > 00-Fastq/SAMPLEID_any_information_R1_001.fastq; gzip 00-Fastq/SAMPLEID_any_information_R1_001.fastq 00-Fastq/SAMPLEID_any_information_R1_001.fastq.gz;
gunzip -c raw_fastq/*R2_001.fastq.gz > 00-Fastq/SAMPLEID_any_information_R2_001.fastq; gzip 00-Fastq/SAMPLEID_any_information_R2_001.fastq 00-Fastq/SAMPLEID_any_information_R2_001.fastq.gz
#STEP.6 - Run the quality control with Trimmomatic (baqcomTrimmomatic pipeline):
# -p is the number of processors
# -a is the name of adapter. Default=TruSeq3-PE-2.fa
#Other options for adapters (-a):
#NexteraPE-PE.fa
#TruSeq2-PE.fa
#TruSeq2-SE.fa
#TruSeq3-PE.fa
#TruSeq3-SE.fa
$ baqcomTrimmomatic.R -p 36 -s 2
STEP.7.1 - Mapping with STAR (baqcomSTAR pipeline):
#Download the last release of the genome(.fa) and annotation(.gtf) of specie that you will work with
#https://www.ensembl.org/info/data/ftp/index.html
#Generate the genome indexes files. This step needs to be performed just once for each genome/annotation version. After the index generation step, the mapping and reads count will be started automatically.
#To index and mapping:
$ baqcomSTAR.R -t /path/to/genome.fa -g /path/to/annotation_version/annotation_version.gtf -p 20 -q 2
#To mapping:
$ baqcomSTAR.R -t /path/to/index_STAR_folder -p 20 -q 3
#obs. If needs to run the script with more than 20 thread, it must change ulimit in the system used (see "increasing_Limit_CentOS_7" file ==> https://naveensnayak.com/2015/09/17/increasing-file-descriptors-and-open-files-limit-centos-7/).
If you prefer, you can use HISAT2 to perform the mapping step and use HtseqCount or FeaturesCount to count reads
STEP.7.2 - Mapping with HISAT2 (baqcomHisat2Mapping pipeline):
$ baqcomHisat2.R -t /path/to/genome.fa -g /path/to/annotation_version/annotation_version.gtf -p 20 -q 2
STEP.7.2.1 - Counting reads with HTseqCounts or FeaturesCount (baqcomHtseqCounting or baqcomFeaturesCount pipeline)
$ baqcomHtseq.R -g /path/to/annotation_version/annotation_version.gtf
$ baqcomFeatureCounts.R -a /path/to/annotation_version/annotation_version.gtf