This tool can be used to extract specific keys out of sequence files
sbt clean stage
- generates the unpacked, runnable application in the target/universal/stage/
folder.
sbt clean universal:packageBin
- generates an application ZIP file
Note: Must use one of the supported JVMs for Apache Spark (at this time Java 8 through Java 11 are supported)
extract-seqfiles-key <version>
HathiTrust Research Center
-l, --log-level <LEVEL> (Optional) The application log level; one of INFO,
DEBUG, OFF (default = INFO)
-n, --num-partitions <N> (Optional) The number of partitions to split the
input set of HT IDs into, for increased
parallelism
-o, --output <DIR> Write the output to DIR
--spark-log <FILE> (Optional) Where to write logging output from
Spark to
-h, --help Show help message
-v, --version Show version of this program
trailing arguments:
input (required) The path to the folder containing the input data