Skip to content

Distributed computing for data science tasks, executed on a Ubuntu server.

Notifications You must be signed in to change notification settings

dimits-ts/large-scale-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Scale Data Management

This repository consists of two projects:

  • A Java Hadoop map-reduce application which:

    • Computes the occurences of each word for a large file
    • Computes spotify song statistics for each country and month
  • A Hadoop SPARK-Cassandra application which:

    • Generates a configurable stream of test data, posting them to a Kafka cluster
    • Reads, preprocesses and combines the stream data with static data using SPARK
    • Periodically posts them to a Cassandra cluster
    • Performs queries using CQL on the Cassandra cluster

About

Distributed computing for data science tasks, executed on a Ubuntu server.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published