Skip to content

Cascading/CoPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMU Workshop on Cascading + City of Palo Alto Data Open Data

We have built an example app in Cascading and Apache Hadoop, based on the City of Palo Alto open data provided via Junar: http://paloalto.opendata.junar.com/dashboards/7576/geographic-information/

Students can extend the example workflow to build derivative apps, or use it as a starting point for other ways to leverage this data.

We will also draw some introductory material from these two previous talks:

For more details, please read the accompanying wiki page.

Build Instructions

To build the sample app from the command line use:

gradle clean jar

Note that this depends on Gradle 1.3+, JVM 1.6, and Apache Hadoop 1.x

Before running this sample app, be sure to set your HADOOP_HOME environment variable. Then clear the out directory. To run on a desktop/laptop with Apache Hadoop in standalone mode:

rm -rf out
hadoop jar ./build/libs/copa.jar data/copa.csv data/meta_tree.tsv data/meta_road.tsv data/gps.csv \
  out/trap out/tsv out/tree out/road out/park out/shade out/reco

To view the results, for example the output recommendations in reco:

ls out
more out/reco/part-00000

An example of log captured from a successful build+run is at https://gist.github.com/3660888

To run the R script, load src/scripts/copa.R into RStudio or from the command line run:

R --vanilla -slave < src/scripts/copa.R

...and then check output in the file Rplots.pdf

Cascalog Build

See the Leiningen build script in project.clj and Cascalog source in the src/main/clj/copa directory.

Note that this depends on Cascalog 1.9 or later, Leiningen 2.0 or later, JVM 1.6, and Apache Hadoop 1.x

To build and run:

lein clean
lein uberjar
rm -rf out/ 
hadoop jar ./target/copa.jar data/copa.csv data/meta_tree.tsv data/meta_road.tsv data/gps.csv \
  out/trap out/park out/tree out/road out/shade out/gps out/reco

About Cascading

There is a tutorial about getting started with Cascading in the blog post series called Cascading for the Impatient. Other documentation is available at http://www.cascading.org/documentation/.

For more discussion, see the cascading-user email forum or check out one of our meetups.

About

Cascading plus City of Palo Alto open data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published