-
Notifications
You must be signed in to change notification settings - Fork 21
/
README
144 lines (113 loc) · 6.62 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
SWIPE
Smith-Waterman database searches with Inter-sequence Parallel Execution
Copyright (C) 2008-2021 Torbjorn Rognes, University of Oslo,
Oslo University Hospital and Sencel Bioinformatics AS
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: Torbjorn Rognes <[email protected]>,
Department of Informatics, University of Oslo,
PO Box 1080 Blindern, NO-0316 Oslo, Norway
--
SWIPE version 2 incorporates a number of new features compared to the initial
release, most notably hit statistics, alignments, as well as support for
nucleotide sequences, large databases and MPI.
The software is distributed from http://github.com/torognes/swipe
Older versions are also available from http://dna.uio.no/swipe/
Binary executables are available for Linux and Mac (both 64-bit).
SWIPE is distributed under the GNU Affero General Public License, version 3.
See the file COPYING for details. SWIPE includes some public domain code for
computing alignment statistics extracted from the BLAST software by the
National Center for Biotechnology Information (NCBI).
Please cite: Rognes T (2011) Faster Smith-Waterman database searches
with inter-sequence SIMD parallelisation. BMC Bioinformatics 12, 221.
Usage: ./[mpi]swipe [OPTIONS]
-h, --help show help
-d, --db=FILE sequence database base name (required)
-i, --query=FILE query sequence filename (stdin)
-M, --matrix=NAME/FILE score matrix name or filename (BLOSUM62)
-q, --penalty=NUM penalty for nucleotide mismatch (-3)
-r, --reward=NUM reward for nucleotide match (1)
-G, --gapopen=NUM gap open penalty (11)
-E, --gapextend=NUM gap extension penalty (1)
-v, --num_descriptions=NUM sequence descriptions to show (250)
-b, --num_alignments=NUM sequence alignments to show (100)
-e, --evalue=REAL maximum expect value of sequences to show (10.0)
-k, --minevalue=REAL minimum expect value of sequences to show (0.0)
-c, --min_score=NUM minimum score of sequences to show (1)
-u, --max_score=NUM maximum score of sequences to show (inf.)
-a, --num_threads=NUM number of threads to use [1-256] (1)
-m, --outfmt=NUM output format [0,7-9=plain,xml,tsv,tsv+] (0)
-I, --show_gis show gi numbers in results (no)
-p, --symtype=NAME/NUM symbol type/translation [0-4] (1)
-S, --strand=NAME/NUM query strands to search [1-3] (3)
-Q, --query_gencode=NUM query genetic code [1-23] (1)
-D, --db_gencode=NUM database genetic code [1-23] (1)
-x, --taxidlist=FILE taxid list filename (none)
-N, --dump=NUM dump database [0-2=no,yes,split headers] (0)
-H, --show_taxid show taxid etc in results (no)
-o, --out=FILE output file (stdout)
-z, --dbsize=NUM set effective database size (0)
Defaults are indicated in parentheses in the option list above. NUM means an
integer, while REAL means a floating point number (possibly in scientific
notation). NAME is a string or integer, while FILE is a filename. For databases,
FILE is the base name of the database files.
Most features and options are similar and compatible with NCBI
BLAST.
All scores (e.g. for options -c and -u) are raw alignment scores
(not bit scores) unless otherwise noted.
Please note that SWIPE does not employ composition based statistics (BLAST
option "-C"), but uses the standard Karin-Altschul statistics. Also, sequence
filtering of low complexity regions (BLAST option "-F") is not implemented.
Please pre-filter your query. SWIPE does not use the new sequence length
adjustment statistics (FSC) introduced in BLAST 2.2.26.
The BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM30, PAM70 and PAM250
matrices are "built-in". Other matrices may be specified with a file name.
The query and database sequence type is specified with the "-p" option using
either a numeric or text argument according to the table below:
Numeric String Query Database Comparisons
0 blastn NT NT Direct + reverse complementary
1 blastp AA AA Direct
2 blastx NT AA Translated query (6 frames)
3 tblastn AA NT Translated database (6 frames)
4 tblastx NT NT Translated query and database (6x6 frames)
A plain text file with a list of taxid numbers, one per line, may be
specified with the "-x" option to search only sequences from the
specified organisms.
SWIPE accepts only databases prepared using the formatdb or makeblastdb
tools that are distributed together with NCBI BLAST and BLAST+. They
can be downloaded at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/
from NCBI. For makeblastdb, please use the "-blastdb_version 4" option
to make databases that are compatible with SWIPE.
The SWIPE distribution includes executable binaries for 64-bit Linux and
Mac. SWIPE may be compiled from sources using either the GNU g++ compiler,
the LLVM Clang compiler or the Intel icpc compiler. SWIPE is marginally
faster when compiled with the Intel compiler than with the GNU compiler.
SWIPE will only run on processors with the SSE2 instruction set and
runs best on processors with the SSSE3 instruction set. Please see
http://en.wikipedia.org/wiki/SSSE3 for a list of processors with this
instruction set.
The enclosed file scoring.pdf contains a table with the scoring system
parameters that will enable E-value statistics to be computed.
Examples of how to run SWIPE:
To run SWIPE with one or more protein sequence queries in the
FASTA-formatted file named "query.fsa" against the protein
database with base name "swissprot" using 8 threads and an expect
value threshold of 0.001:
./swipe -p 1 -i query.fsa -d swissprot -e 0.001 -a 8
To run SWIPE with a nucleotide sequence query in the file "primer.fsa"
against the nucleotide database file with base name "ecoli" using 1
thread and the default expect value of 10:
./swipe -p 0 -i primer.fsa -d ecoli
To search only the Human entries in the "est" database using a
translated search with the protein query sequence in the file
"prot.fsa":
echo 9606 > human.tax
./swipe -p 3 -i prot.fsa -d est -x human.tax