Skip to content

gody7334/StreamingKNORA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StreamingKNORA

This repository contains Streaming-KNORA (S-KNORA), an algorithm designed to analyse streaming data in a distributed environment. S-KNORA is implemented using Spark across multiple Hadoop nodes.

This project was developed in the School of Computer Science of the University of Manchester as part of my MSc dissertation under the supervision of Professor John A. Keane ([email protected]). Additional involved staff: Dr. Firat Tekiner ([email protected]).

This project was a proof of concept that aimed to demonstrate the feasibility of using KNORA ensemble learning on high throughput streaming data. Results show that:

  1. S-KNORA can learn concepts on disjoint streaming data and achieve higher accuracy than the single streaming learning mode;
  2. the pipeline's throughput, running with a large batch size, is up to 6.82 times than the pipeline running on a single thread;
  3. to capture severe concept drift, batch-incremental learning requires more frequent model update in a small batch causing high overhead in a distributed environment.

Repository layout

./Dataset_Single: the datasets used in StreamingKNORA_Single experiments
./Dataset_Spark: the datasets used in StreamingKNORA_Spark experiments
./StreamingKNORA_Single: A Java implementation for batch size selection; it is also considered as an ideal program without overhead.
./StreamingKNORA_Spark: A Spark implementation for performance evaluation that measures throughput and monitor resource utilization on different datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published