Skip to content

Latest commit

 

History

History
204 lines (170 loc) · 5.75 KB

AliTV-project.org

File metadata and controls

204 lines (170 loc) · 5.75 KB

AliTV-project

Project

Members

NameIDemail
Markus AnkenbrandMA[email protected]
Thomas HacklTH[email protected]
Frank FoersterFF[email protected]

Objectives

Whole genome alignment and visualization of homologous regions is an essential tool in comparative genomics. However currently available software either performs purely on large genomes (>100Mbp), when based on directly on alignments, or is aimed at synteny visualisation that depends on comprehensive gene prediction/annotations prior to visualization. Addionally, visualization is usually integrated in some form of interactive viewer, carrying lots of meta information etc. The actual graphics, however are mostly “ugly”.

AliTV objectives:

  1. generation of whole genome alignments
    • using established methods (lastz, mummer)
    • using alternative methods (e.g. daligner)
  2. Conversion of alignment data and visualization
    • using Circos (cirular, 2 genomes)
    • d3.js
      • circular, multiple genomes
      • linear, multiple genomes

git

git-ff-merge strategy for pushing/pulling script available at binf git base (132.187.22.105:common/git-scripts)

org

“TODO” keywords

Current suggestions - entirely open to discussion.

general tasks:

coding tasks:

;;; TH's TODO color scheme
(setq org-todo-keyword-faces
      '(("TODO" . "red1")
        ("BUGF" . "red1")
        ("FEAT" . "orange1")
        ("INPG" . "orange1")
        ("UINV" . "orange1")
        ("DISC" . "CornflowerBlue")
        ("HOLD" . "CornflowerBlue")
        ("DONE" . "ForestGreen")
        ("FIXD" . "ForestGreen")))

column mode

Gives a good overview on outline properties and easy access to modifications

on
C-c C-x C-c (for subtree)
off
q (on highlighted entry)
navigate
arrow keys
modify
S-arrow key or e

http://orgmode.org/worg/org-tutorials/org-column-view-tutorial.html

Log

[2/3] Coding

Actual dev on pipeline source code - features, bugfixes etc, goes here

[2/3] alignment

lastz

mummer

daligner

[0/0] d3js

[0/5] Sandbox

Ideas, brainstorming, experimenting, etc …

linear visualization with using d3.js

[0/4] Standards

chromosomes per genome

.tsv
  • simple tab separated format, defined columns
sample
SID: sequence id (chromosome)
GID: genome id (to which of multiple genomes does this sequence belong)
LEN: length of this sequence
SEQ: sequence as text (optional)

SID	GID	LEN[	SEQ]
PROS
  • simple
CONS
  • not standardized
  • not flexible

annotations per genome

.bed
sample
SID: sequence id (chromosome)
FID: feature id

SID	FROM	TO	FID	...
PROS
  • simple, standardized tsv format, with comprehensive tool box (bedtools) and conversion scripts to other formats
  • exiting data set of arbitrary feature annotations can usually converted to bed very easy (gff, blast, sam …)
CONS
  • To use the features for links the fourth column (feature id) has to be mandatory, in contrast to the bed specification.

links

.sif
sample
FID_[AB]: feature id set A/B
LTYPE: link type

FID_A	LTYPE	FID_B
PROS
  • simple tsv, compatible with Cytoscape
CONS
  • no link attributes, e.g. identity, score etc..
  • to add those attributes either an additional file is needed or the “link type” has to be abused
.tsv
  • simple tsv with header line
  • mandatory columns are fida and fidb, all other columns are (named) link properties
  • the header starts with a hashtag (#)
  • if no header is present “#fida type fidb” is assumed, therefore supporting .sif format
sample
FID_[AB]: feature id set A/B
LTYPE: link type
IDY: link identity

#fida	type	fidb	identity
FID_A	LTYPE	FID_B	IDY
PROS
  • flexible
  • extensible
  • can be imported into Cytoscape (as edge properties)
CONS
  • not standardized
  • (useful) header lines have to be documented

[0/0] Data

Test data sets etc.

[0/0] Web

Paper

The ulitmate goal.

__org__