Skip to content

tlkthp/mapreduce-algo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce Algorithms

Calculating relative frequencies using:

  1. Pair Approach
  2. Stripe Approach
  3. Hybrid Approach
  4. Driver Programs

Developed and tested with:

  1. Java 8, and
  2. Hadoop 2.7.0

Executing Jobs:

  1. Install and configure - Java 8, Hadoop 2.7.0 & Maven.
  2. Clone this project repo. It's a maven project.
  3. Create jar file by executing:
% mvn clean package

This will generate jar file under target folder.

  1. Start HDFS and YARN
% start-dfs.sh
% start-yarn.sh
  1. Run the submitjobs.sh script available in project's root folder. First make sure that script is executable.
% chmod +x submitjobs.sh
% ./submitjobs.sh

MapReduce jobs' outputs will be available in output folder under project's root folder.

Demo Jobs Details:

What does submitjobs.sh do?

  1. It moves sample input data available in input folder to HDFS. Before that it deletes and creates HDFS input folder.
  2. Deletes HDFS output folder.
  3. Deletes input and output folder in HDFS
  4. Submits RF_PairsJob, RF_StripesJob and RF_HybridJob sequentially.
  5. Once jobs are completed, deletes and creates local output folder.
  6. Copies MapReduce outputs to local output folder.

NOTE: Script deletes and creates input and output folders (under project's root path) on every execution to avoid ... If folders are not available script will print error messages, ignore it.

About

MapReduce Algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published