Skip to content

Latest commit

 

History

History
30 lines (26 loc) · 1.72 KB

README.md

File metadata and controls

30 lines (26 loc) · 1.72 KB

Scala CI codecov GitHub release (latest SemVer including pre-releases)

Metadata-extract-seqfiles-key

This tool can be used to extract specific keys out of sequence files

Build

sbt clean stage - generates the unpacked, runnable application in the target/universal/stage/ folder.
sbt clean universal:packageBin - generates an application ZIP file

Usage

Note: Must use one of the supported JVMs for Apache Spark (at this time Java 8 through Java 11 are supported)

extract-seqfiles-key <version>
HathiTrust Research Center
  -l, --log-level  <LEVEL>    (Optional) The application log level; one of INFO,
                              DEBUG, OFF (default = INFO)
  -n, --num-partitions  <N>   (Optional) The number of partitions to split the
                              input set of HT IDs into, for increased
                              parallelism
  -o, --output  <DIR>         Write the output to DIR
      --spark-log  <FILE>     (Optional) Where to write logging output from
                              Spark to
  -h, --help                  Show help message
  -v, --version               Show version of this program

 trailing arguments:
  input (required)   The path to the folder containing the input data