Skip to content

Latest commit

 

History

History
32 lines (28 loc) · 1.82 KB

README.md

File metadata and controls

32 lines (28 loc) · 1.82 KB

Scala CI codecov GitHub release (latest SemVer including pre-releases)

Metadata-combine-seq-files

This tool can be used to combine/update sequence files

Build

sbt clean stage - generates the unpacked, runnable application in the target/universal/stage/ folder.
sbt clean universal:packageBin - generates an application ZIP file

Usage

Note: Must use one of the supported JVMs for Apache Spark (at this time Java 8 through Java 11 are supported)

combine-seq-files <version>
HathiTrust Research Center
  -l, --log-level  <LEVEL>    (Optional) The application log level; one of INFO,
                              DEBUG, OFF (default = INFO)
  -n, --num-partitions  <N>   (Optional) The number of partitions to split the
                              input set of HT IDs into, for increased
                              parallelism
  -o, --output  <DIR>         Write the output to DIR
      --spark-log  <FILE>     (Optional) Where to write logging output from
                              Spark to
  -h, --help                  Show help message
  -v, --version               Show version of this program

 trailing arguments:
  input (required)    The path to the folder containing the input data
  update (required)   The path to the folder containing the updated data that
                      should be added to (or overwrite) the input