This repository has been archived by the owner on Aug 26, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 63
roadmap
Richard Smith edited this page Feb 4, 2014
·
9 revisions
- FASTA
- FASTQ
- GenBank
- EMBL
Loads more at http://www.bioperl.org/wiki/HOWTO:SeqIO, but many of these are antiquated formats. I think we should prioritise by popularity. The sooner BioJulia is useful the better for the community.
- GFF & GTF (this is messy in most languages - it would be great if we could cleanly handle all the quirks)
- BED
- VCF
- BLAST (tabular/long form)
- MultiFASTA aligned
- CLUSTAL
- BAM/SAM
- Phylip
- PFAM
- Newick (can be ported from Phylogenetics.jl)
- Nexus
- PhyloXML
also database connectors, for e.g. BioSQL
We'll want to have representations of:
- DNA, RNA and amino acid sequences
- ranges and features of sequences (where the sequence may or may not be present)
- alignments - pairwise and multiple
- graph-derivative structures like phylogenetic trees, genetic networks and biochemical pathways
- probabilistic models of sequences (e.g. motifs - perhaps this isn't a high priority)
- BLAST
- Blat
- bowtie/2
- bwa
- HMMER
- Primer3
- Phylogenetic tools (clustal, mafft, PAML, phylip)
- samtools (unless we can do something faster in our own sam/bam implementation)
- signalP/targetP
- assemblers: velvet/oases, trinity, soapdenovo
- BioMart
- Ensembl
- EMBL
- NCBI
- SRA
- genome sequences
- genome annotations
- gene ontologies