-
Notifications
You must be signed in to change notification settings - Fork 11
Home
ceteri edited this page Aug 2, 2012
·
28 revisions
Welcome to Cascading for the Impatient, a series of blog posts and Cascading 2.0 code examples to get you started. Quickly. Like, yesterday.
- Implements simplest Cascading app possible
- Copies each TSV line from source tap to sink tap
- Roughly, in about a dozen lines of code
- Physical plan: 1 Mapper
- https://gist.github.com/2911686
- Implements a simple example of WordCount
- Uses a regex to split the input text lines into a token stream
- Generates a DOT file, to show the Cascading flow graphically
- Physical plan: 1 Mapper, 1 Reducer
- https://gist.github.com/3020297
- Uses a custom Function to scrub the token stream
- Discusses when to use standard Operations vs. creating custom ones
- Physical plan: 1 Mapper, 1 Reducer
- https://gist.github.com/3021655
- Shows how to use a HashJoin on two pipes
- Filters a list of stop words out of the token stream
- Physical plan: 1 Mapper, 1 Reducer
- https://gist.github.com/3043745
- Calculates TF-IDF using an ExpressionFunction
- Shows how to use a SumBy and a CoGroup
- Physical plan: 10 Mappers, 10 Reducers
- https://gist.github.com/3043791
- Includes unit tests in the build
- Shows how to use other TDD features: checkpoints, assertions, traps, debug
- https://gist.github.com/3044049
- Implements switch to run the example in local mode (without Apache Hadoop)
- Uses an R script to analyze/visualize the results
If you want to read in more detail about the classes in the Cascading API which were used, see the Cascading 2.0 User Guide and JavaDoc.
For more discussion, see the cascading-user email forum.