Bioinformatic utilities for nucleotide sequences. Written in Bash, perl or python.
- Given an input fasta file in single line format and a window size (int), this script will calculate the GC percentage of each region for every non-overlapping window and output it in a bam format
Input: FASTA single line (you can use a preprocessing script like PAGIT'sfasta2singleLine.pl
)
Output: bam-like file with the following data - "chromosome startPos endPos GC% "
Used for: This script was created specifically to use for visualization in Circos software. The input files for this software are required as bam format.
- Finds those positions in a multifasta alignment file that are constant in every sequence, and extracts them, leaving as output only those nucleotides/aminoacids that are variable for at least one of the sequences
Input: MultiFASTA alignment file. Output from any MSA software
Output: Another multiFASTA file
Used for: Performing downstream phylogenetic SNP analysis, for example. RAxML, in particular, requires that your input fasta or phy shows only the variable sites if you use the prefix ASC_ in the -m flag (page 27 of this)