Skip to content

Latest commit

 

History

History
177 lines (132 loc) · 6.89 KB

README.adoc

File metadata and controls

177 lines (132 loc) · 6.89 KB

graphinout: graph in → graph out

  • Convert between graph formats

  • Validate GraphML

Purpose

Graphinout reads many common graph file formats and converts them losslessly to a common format: GraphML.

There are many graph toolkits, graph databases and graph-based apps, which all implement their importers and map incoming graph to their internal APIs. As there are so many variants of graphs and even more kinds of internal APIs, there is no "common Java graph API" in sight. Instead, graphinout just reads all kinds of formats and converts them to a common graph file format, GraphML.

Benefits
App developers

of graph-related software can focus on core features. Need only to read GraphML.

Converter developers

easy to add more input formats

Users

can re-use their existing data effortlessly using almost any format their data producing software writes.

Features

  • Written in modern Java (17).

  • Parses input and either converts to valid GraphML file, or

  • Shows comprehensible validation errors.

  • Easy to integrate into existing software.

Non-Features
  • Java graph API

  • Convert relational data (CSV, Excel, databases) into graphs

Output Format

  • GraphML (← our notes about the format)

Writing other formats could be a nice future extension. This would turn graphinout into the 'pandoc of graph file formats'.

Input Formats

A great 2015 paper on graph file formats: https://arxiv.org/pdf/1503.02781.pdf including links to graph data repositories — time frames indicate when the format was/is used

Text based

Diverse text syntaxes to encode graphs. Common formats:

  • TGF Trivial Graph Format Wikipedia

  • DOT language spec

  • Pajek, spec

  • Nested Network Format

  • BioGRID Tab 3.0 Format spec

  • PSI-MI Tab Version 2.5 Format spec

  • Matrix Market from the large 'Matrix Market' spec

XML based

Expressed as XML. Common formats:

JSON based

In JSON format. Common formats:

Triple based

Triples (semantic web) can be encoded themselves as XML or text-based syntaxes. They are used for knowledge graphs, vocabularies, linked open data and ontologies. Common file types:

  • Turtle (text-based, .ttl)

  • RDF/XML (XML-based, .rdf.xml

  • N3 (text-based, n3)

  • N-Triples (text-based, nt)

  • Web Ontology Language (XML-based, .owl.xml)

  • BioPAX (OWL-based) BioPAX (Biological PAthways eXchange) Format spec

Usage

Command line

graphinout can be used on the command line to read format X and write to GraphML.

Library

graphinout can be used as a Java library to read format X and write to a GraphML file or Writer.

Introduction to Graphs

A graph is first a mathematical concept. As it turns out, it is rather a family of concepts. A good introduction into the general idea can be found in Wikipedia on Graph Theory. In computer science, a graph is an abstract data type, see Wikipedia. An exhausting glossary of graph theory explains all terms with a special meaning in a graph context.

Note
Graph vs. Graph File Format
Don’t confuse graph file format features with graph features. Graph features such as a cycle-free graph do not depend on the file format. Graph features depend on the kind of data stored in a graph file format. A graph file format needs to be able to represent e.g. directed graphs. All formats which do can represent cycle-free directed graphs as well as graphs with cycles. There are many, many graph concepts, which are not required to understand or even know when converting graph input data is your job.

Graph File Format Features

  • undirected graphs

  • directed graphs Wikipedia

  • mixed graph: mix of directed and undirected edges

  • self-loops: An edge from a node A to itself

  • parallel edges aka multi-edges: Multiple edges from a node A to another node B

  • edge attributes (e.g. type of edge or weight)

  • node attributes (e.g. type of node or weight)

  • hyper-graphs: edges with more than 2 endpoints

Ecosystem

Graph Toolkits

Graph Databases

Graph Drawing

  • GraphViz

  • Cytoscape.js

  • Gephi ←→ GraphML subset

  • yED ←→ partial GraphML support

Road Map

Milestone 1 — Convert GEXF to GraphML on the command line
  1. [ ] Read GraphML, write GraphML as Java standalone app

    1. internal property graph model

    2. GraphML reader, based on XML

    3. GraphML writer, based on XML

  2. [ ] Add GEXF reader, based on XML

  3. [ ] Document usage as command line app

Milestone 2 — Convert TGF and DOT to GraphML on the command line and become extensible
  1. [ ] Import TGF to start parsing text-syntaxes

  2. [ ] Import DOT

  3. [ ] Generalize internal importer-API to make extending with more importers easy

  4. [ ] Document internal API

Milestone 3 — become a library
  1. [ ] Add API to allow app developers to embed graphinout as a library

  2. [ ] Document external API

  3. [ ] Import RDF N-Triples

  4. [ ] Add more relevant importers

Milestone 3 — become a web service
  1. [ ] Create REST-ful API for converting graphs

  2. [ ] Add more relevant importers

Milestone 4 — become a web UI
  1. [ ] Create web app to let users convert graphs with a simple UI

  2. [ ] Add more relevant importers