This tool can be used to combine/update sequence files
sbt clean stage
- generates the unpacked, runnable application in the target/universal/stage/
folder.
sbt clean universal:packageBin
- generates an application ZIP file
Note: Must use one of the supported JVMs for Apache Spark (at this time Java 8 through Java 11 are supported)
combine-seq-files <version>
HathiTrust Research Center
-l, --log-level <LEVEL> (Optional) The application log level; one of INFO,
DEBUG, OFF (default = INFO)
-n, --num-partitions <N> (Optional) The number of partitions to split the
input set of HT IDs into, for increased
parallelism
-o, --output <DIR> Write the output to DIR
--spark-log <FILE> (Optional) Where to write logging output from
Spark to
-h, --help Show help message
-v, --version Show version of this program
trailing arguments:
input (required) The path to the folder containing the input data
update (required) The path to the folder containing the updated data that
should be added to (or overwrite) the input