roadmap

IO

sequence formats

FASTA
FASTQ
GenBank
EMBL

Loads more at http://www.bioperl.org/wiki/HOWTO:SeqIO, but many of these are antiquated formats. I think we should prioritise by popularity. The sooner BioJulia is useful the better for the community.

annotation formats

GFF & GTF (this is messy in most languages - it would be great if we could cleanly handle all the quirks)
BED
VCF

alignment formats

BLAST (tabular/long form)
MultiFASTA aligned
CLUSTAL
BAM/SAM
Phylip
PFAM

tree formats

Newick (can be ported from Phylogenetics.jl)
Nexus
PhyloXML

also database connectors, for e.g. BioSQL

datastructures

We'll want to have representations of:

DNA, RNA and amino acid sequences
ranges and features of sequences (where the sequence may or may not be present)
alignments - pairwise and multiple
graph-derivative structures like phylogenetic trees, genetic networks and biochemical pathways
probabilistic models of sequences (e.g. motifs - perhaps this isn't a high priority)

tool wrappers

BLAST
Blat
bowtie/2
bwa
HMMER
Primer3
Phylogenetic tools (clustal, mafft, PAML, phylip)
samtools (unless we can do something faster in our own sam/bam implementation)
signalP/targetP
assemblers: velvet/oases, trinity, soapdenovo

service APIs

BioMart
Ensembl
EMBL
NCBI
SRA

datasets

genome sequences
genome annotations
gene ontologies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly