Calculating relative frequencies using:
- Pair Approach
- Stripe Approach
- Hybrid Approach
- Driver Programs
- Java 8, and
- Hadoop 2.7.0
- Install and configure - Java 8, Hadoop 2.7.0 & Maven.
- Clone this project repo. It's a maven project.
- Create jar file by executing:
% mvn clean package
This will generate jar file under target
folder.
- Start HDFS and YARN
% start-dfs.sh
% start-yarn.sh
- Run the
submitjobs.sh
script available in project's root folder. First make sure that script is executable.
% chmod +x submitjobs.sh
% ./submitjobs.sh
MapReduce jobs' outputs will be available in output
folder under project's root folder.
- 3 sample input files
- One job per approach
- Each job is configured to have 3 reducers.
- Outputs of a demo run on pseudo-distributed mode
What does submitjobs.sh
do?
- It moves sample input data available in
input
folder to HDFS. Before that it deletes and creates HDFSinput
folder. - Deletes HDFS
output
folder. - Deletes
input
andoutput
folder in HDFS - Submits
RF_PairsJob
,RF_StripesJob
andRF_HybridJob
sequentially. - Once jobs are completed, deletes and creates local
output
folder. - Copies MapReduce outputs to local
output
folder.
NOTE: Script deletes and creates
input
andoutput
folders (under project's root path) on every execution to avoid ... If folders are not available script will print error messages, ignore it.