Skip to content

nimbo3/Keenbo

Repository files navigation

Keenbo Build Status Coverage Security Rating Lines of Code

A simple to use search engine for search data.

Project consist of different modules:

  • crawler: crawls the pages and save in database
  • search-engine: search API on stored data
  • forward-extractor: run a mapreduce for calculate incomming link for a website to have better results

Getting Started

For run this project on your own local machine or server you should install Zookeeper and hbase and hadoop and elasticsearch and kafka.

Prerequisites

For installing dependencies for this project, read wikis and configure it properly depend on your servers.

Running

For running application, you can use .sh files inside bin folder.

Running the tests

Test all projects with below command:

mvn test

Built With

  • Spark - Used to run mapreduces
  • Kafka - Used to handling links queue
  • ElasticSearch - Used to run search queries
  • Redis - Used to check duplicated pages
  • HBase - Used to store data
  • DropWizard - Used to monitoring
  • JSoup - Used to parse the pages
  • Caffeine - Used to store requested urls to send request politely
  • Jackson - Used to serializing objects
  • Maven - Used to Dependency Management

Authors

See also the list of contributors who participated in this project.

About

crawler and search engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages