Large Scale Data Management

This repository consists of two projects:

A Java Hadoop map-reduce application which:
- Computes the occurences of each word for a large file
- Computes spotify song statistics for each country and month
A Hadoop SPARK-Cassandra application which:
- Generates a configurable stream of test data, posting them to a Kafka cluster
- Reads, preprocesses and combines the stream data with static data using SPARK
- Periodically posts them to a Cassandra cluster
- Performs queries using CQL on the Cassandra cluster

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
map-reduce		map-reduce
project-2		project-2
.gitignore		.gitignore
README.md		README.md
output.txt		output.txt

Provide feedback