RefineOnSpark is a driver program to run OpenRefine jobs on the Spark cluster.
- Prerequsites on the cluster
- An instance of OpenRefine is up and bind to the default localhost:3333.
- Input files are served via HDFS, however local files are also accepted, but have to be located under the same path on all the worker nodes.
- Application taxonomy
TODO