AliTV-project

Project

Members

Name	ID	email
Markus Ankenbrand	MA	markus.ankenbrand@uni-wuerzburg.de
Thomas Hackl	TH	thomas.hackl@uni-wuerzburg.de
Frank Foerster	FF	frank.foerster@biozentrum.uni-wuerzburg.de

Objectives

Whole genome alignment and visualization of homologous regions is an essential tool in comparative genomics. However currently available software either performs purely on large genomes (>100Mbp), when based on directly on alignments, or is aimed at synteny visualisation that depends on comprehensive gene prediction/annotations prior to visualization. Addionally, visualization is usually integrated in some form of interactive viewer, carrying lots of meta information etc. The actual graphics, however are mostly “ugly”.

AliTV objectives:

generation of whole genome alignments
- using established methods (lastz, mummer)
- using alternative methods (e.g. daligner)
Conversion of alignment data and visualization
- using Circos (cirular, 2 genomes)
- d3.js
  - circular, multiple genomes
  - linear, multiple genomes

git

git-ff-merge strategy for pushing/pulling script available at binf git base (132.187.22.105:common/git-scripts)

org

“TODO” keywords

Current suggestions - entirely open to discussion.

general tasks:

coding tasks:

;;; TH's TODO color scheme
(setq org-todo-keyword-faces
      '(("TODO" . "red1")
        ("BUGF" . "red1")
        ("FEAT" . "orange1")
        ("INPG" . "orange1")
        ("UINV" . "orange1")
        ("DISC" . "CornflowerBlue")
        ("HOLD" . "CornflowerBlue")
        ("DONE" . "ForestGreen")
        ("FIXD" . "ForestGreen")))

column mode

Gives a good overview on outline properties and easy access to modifications

on: C-c C-x C-c (for subtree)
off: q (on highlighted entry)
navigate: arrow keys
modify: S-arrow key or e

http://orgmode.org/worg/org-tutorials/org-column-view-tutorial.html

Log

[2/3] Coding

Actual dev on pipeline source code - features, bugfixes etc, goes here

[2/3] alignment

lastz

mummer

daligner

[0/0] d3js

[0/5] Sandbox

Ideas, brainstorming, experimenting, etc …

linear visualization with using d3.js

d3js
- http://d3js.org/
- http://www.tips-for-excel.com/tag/d3-js-tutorial (esp. fig 2)
Parallel linear diagrams
Sankey diagrams
- http://blog.ouseful.info/2012/05/24/f1-championship-points-as-a-d3-js-powered-sankey-diagram/
Tree layout
- http://blog.pixelingene.com/demos/d3_tree/

[0/4] Standards

chromosomes per genome

.tsv

simple tab separated format, defined columns

sample

SID: sequence id (chromosome)
GID: genome id (to which of multiple genomes does this sequence belong)
LEN: length of this sequence
SEQ: sequence as text (optional)

SID	GID	LEN[	SEQ]

PROS

simple

CONS

not standardized
not flexible

annotations per genome

.bed

http://genome.ucsc.edu/FAQ/FAQformat.html
http://bedtools.readthedocs.org/en/latest/

sample

SID: sequence id (chromosome)
FID: feature id

SID	FROM	TO	FID	...

PROS

simple, standardized tsv format, with comprehensive tool box (bedtools) and conversion scripts to other formats
exiting data set of arbitrary feature annotations can usually converted to bed very easy (gff, blast, sam …)

CONS

To use the features for links the fourth column (feature id) has to be mandatory, in contrast to the bed specification.

links

.sif

http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats

sample

FID_[AB]: feature id set A/B
LTYPE: link type

FID_A	LTYPE	FID_B

PROS

simple tsv, compatible with Cytoscape

CONS

no link attributes, e.g. identity, score etc..
to add those attributes either an additional file is needed or the “link type” has to be abused

.tsv

simple tsv with header line
mandatory columns are fida and fidb, all other columns are (named) link properties
the header starts with a hashtag (#)
if no header is present “#fida type fidb” is assumed, therefore supporting .sif format

sample

FID_[AB]: feature id set A/B
LTYPE: link type
IDY: link identity

#fida	type	fidb	identity
FID_A	LTYPE	FID_B	IDY

PROS

flexible
extensible
can be imported into Cytoscape (as edge properties)

CONS

not standardized
(useful) header lines have to be documented

[0/0] Data

Test data sets etc.

[0/0] Web

Paper

The ulitmate goal.

Name	ID	email
Markus Ankenbrand	MA	[email protected]
Thomas Hackl	TH	[email protected]
Frank Foerster	FF	[email protected]

Files

AliTV-project.org

Latest commit

History

AliTV-project.org

File metadata and controls

AliTV-project

Project

Members

Objectives

git

org

“TODO” keywords

column mode

Log

[2/3] Coding

[2/3] alignment

lastz

mummer

daligner

[0/0] d3js

[0/5] Sandbox

linear visualization with using d3.js

[0/4] Standards

chromosomes per genome

.tsv

sample

PROS

CONS

annotations per genome

.bed

sample

PROS

CONS

links

.sif

sample

PROS

CONS

.tsv

sample

PROS

CONS

[0/0] Data

[0/0] Web

Paper

__org__

org