-
Notifications
You must be signed in to change notification settings - Fork 1
Under the hood
json/ contains ~150MB of json files representing graphs (<hex hash>.json
) and an index file (mrn2graphs.json
) mapping each MRN to a list of the graph files that contain it.
make-json/ contains the scripts needed in order to generate these graphs with cable2graph (see the HOWTO file for details). This enables you to experiment with various ways of generating json files for your CableWeaver fork.
Currently, there are 4 steps (see make-json.sh):
- Generate a full graph of all cables and references
- Find clusters
- Find communities using the the Blondel et al. multilevel algorithm
- Split all graphs that are "too big to handle" via CableWeaver (rule of thumb: graphs with a graphml file >100KB) to sub-communities.
Note that both steps 2.
and 3.
happen at line 7 (that's what the --clusters
and --multilevel
switches mean).
If you fork this - you can try other methods to generate the json/
folder. To see what algorithms are available - read Cable2Graph's observations page.
g2json is a downsized version of Cable2Graph's g2svg (not having to compute the layout makes it a hell of a lot faster). The main additions are:
- Fixing directionality (cluster and community algorithms treat the graph as non-directional and botch the directionality).
-
Adding
auxiliary information to the json files that is [at least for me] easier to compute in python.
Note that
color
andcolorindex
are 2 different ways to give a color to a node (at the moment we usecolor
since it's globally consistent, butcolorindex
can produce more readable graphs (this is what the prototype uses) as long as your color palette contains enough colors).
CableWeaver uses D3's force layout feature, with various tweaks that are not necessarily optimal (I'm a D3 noob), so if you're a D3 expert - you may probably find ways to do things better, faster and more elegant.
One thing I couldn't figure out how to do is make the nodes anything more complex than a circle (e.g. add the MRN as text). As soon as I use a containing a circle and text (or even only a circle), the force layout system becomes too slow to be practical. Perhaps it's only a matter of optimization, perhaps nothing can be done (and a circle with a mouse-over title is the best we can have). I'd love to get some input from D3 wizards about this.