Skip to content
ceteri edited this page Aug 2, 2012 · 28 revisions

Cascading for the Impatient

Welcome to Cascading for the Impatient, a series of blog posts and Cascading 2.0 code examples to get you started. Quickly. Like, yesterday.

  • Implements simplest Cascading app possible
  • Copies each TSV line from source tap to sink tap
  • Roughly, in about a dozen lines of code
  • Physical plan: 1 Mapper
  • https://gist.github.com/2911686
  • Implements a simple example of WordCount
  • Uses a regex to split the input text lines into a token stream
  • Generates a DOT file, to show the Cascading flow graphically
  • Physical plan: 1 Mapper, 1 Reducer
  • https://gist.github.com/3020297
  • Uses a custom Function to scrub the token stream
  • Discusses when to use standard Operations vs. creating custom ones
  • Physical plan: 1 Mapper, 1 Reducer
  • https://gist.github.com/3021655
  • Shows how to use a HashJoin on two pipes
  • Filters a list of stop words out of the token stream
  • Physical plan: 1 Mapper, 1 Reducer
  • https://gist.github.com/3043745
  • Calculates TF-IDF using an ExpressionFunction
  • Shows how to use a SumBy and a CoGroup
  • Physical plan: 10 Mappers, 10 Reducers
  • https://gist.github.com/3043791
  • Implements switch to run the example in local mode (without Apache Hadoop)
  • Uses an R script to analyze/visualize the results

If you want to read in more detail about the classes in the Cascading API which were used, see the Cascading 2.0 User Guide and JavaDoc.

For more discussion, see the cascading-user email forum.

Clone this wiki locally