Skip to content

A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched around that broken links.

License

Notifications You must be signed in to change notification settings

kodekracker/Rotto-Links-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rotto-Links-Scraper

A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched in the broken links contained page .

##Installation

  1. Redis
  2. Fabric
  3. Python 2.7+

##Instructions

  1. First install all dependencies listed in requirements.txt using pip package manager :
    $ pip install -r requirements.txt
  1. Set the DATABASE_PATH environment variables (i.e SMTP_USER, SMTP_PASSWORD) in your shell config file(i.e .bashrc , .zshrc or etc)
    # your shell config file
    export DATABASE_PATH='/path/to/database/'
  1. Also, set the two more environment variables required for SMTP Server for sending email to users in your shell config file.
    # your shell config file
    export SMTP_USER='smtp-username'
    export SMTP_PASSWORD='smtp-password'
  1. Also, set the one more environmnet variable to save Logs of the app in defined location.
    # your shell config file
    export LOGS_DIR='path/to/logs'

##Commands Note:- First install Fabric to run below commands

To run a gui app :

    $ fab app

To run a dispatcher :

    $ fab dispatcher

To run a worker :

    $ fab worker

##Developer

  1. Akshay Pratap Singh
  2. Sunny Gupta

About

A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched around that broken links.

Resources

License

Stars

Watchers

Forks

Packages

No packages published