Forked version of the Juxta Command Line tool created for eMOP by Performant Software Solutions. Will be official after eMOP is complete (10/1/14).
Juxta-CL is available open-source for use under the Apache Software License, v2.0.
=============================================================================== Juxta Command Line (Juxta CL)
JuxtaCL is a specialized form of Juxta. It is a command-line tool that accepts the path to two files a parameters. It will collate them and return their change index (degree of difference between the two files).
Java 1.6+ Maven 3.x
JuxtaCL is built by maven. Execute: mvn package
Once built, the binary distribution can be found in the target directory. It will be named: juxta-{version}-bin.tar.gz. Expand this archive and move into the top level directory. It contains a script for launching JuxtaCL named juxta.sh. It accepts the following command line arguments:
-help - displays usage information -version - prints the JuxtaCL version -strip - takes one XML file and strips out the tag content. This content is streamed to std:out -diff [options] - and are the two files to compare. [options] is the set of config options for the comparison.
Valid options include:
[+|-]case - toggles case
sensitivity.
Default: insensitive
[+|-]punct - toggles punctuation
sensitivity
Default: insensitive
-hyphen [all|linebreak|none] - sets hyphenation handling
Default: all
-algorithm - set the algorithm used
[ juxta | to determine percent
levenshtein | differnece betweenn the
jaro_winkler | files. Defaults to juxta.
dice_sorensen ]
Also included in the final package are helper scrips; strip.sh. diff.sh and all.sh.
strip.sh takes one XML file as a parameter. Flat text is dumped to std:out
diff.sh takes 2 files. The change index (as calculated with Juxta algorigthm) is returned. The -algorithm paramter is also accepted.
all.sh takes 2 file parameters. It will return a table of change indexes, one row for each algorithm.