WebScanner

A simple webcrawler to detect broken links in websites recursively using Python.

How to use

You can use and extend the tool to easily crawl a website and check for broken links and other errors.
Download the pageCrawler.py and call it from the terminal using:

python3 pageCrawler.py URL_TO_CRAWL

Check out all the options using

python3 pageCrawler.py --help

Run using Docker

Use the one-liner below to immediately use the WebScanner from DockerHub:

docker run --rm emeraldit/webscanner:1.0.1 URL_TO_CRAWL

To run the docker command and send the output to your Slack, execute the image like this:

docker run --rm -e SLACK_BOT_TOKEN='your-slack-token' emeraldit/webscanner:1.0.1 --channel_id SLACK_ID URL_TO_CRAWL

You can also easily run the script using Docker.
Build the image:

docker build --tag emeraldit/webscanner:1.0.1 .

Run the image

docker run --rm \
  --name webscanner.container \
  emeraldit/webscanner:1.0.1 \
  --help

or on powershell:

docker run --rm `
  --name webscanner.container `
  emeraldit/webscanner:1.0.1 `
  --help

Use in your own project

You can also use the class directly in your own python code:

from webscanner import WebScanner

crawler = WebScanner(
    URL_TO_CRAWL,
    prefix=OPTIONAL_PREFIX,
    max_depth=2,
    test_external_urls=True,
    verbose=2,
)
crawler.crawl()

The prefix determines which links are crawled, for example you can limit the crawling to a specific subdirectory of a domain such as https://domain.com/dir/ Everything above the /dir/ will be ignored, when setting the prefix to this url.

The max_depth determines how deep the crawler goes, for example if set to 1, only the links of the initial page are followed and the process stops. If set to 2, all links of the initial page and all following links from the pages that follow the initial page are crawled.

slackHandler

The WebScanner can send the results of the run directly to your Slack channel.

By providing the --channel_id or -cid flag, in execution, you can provide the ID of the Slack channel in which you would like to post the results.

In order to allow the WebScanner to send messages to your Slack, you need to expose, into your environment variables, the SLACK_BOT_TOKEN. Curious where to find this token? Read further on Slack's documentation.

Further development

Clone this repository

[email protected]:F3licity/WebScanner.git

Start a new virtual environment on the root folder of this project, using Python 3.8, and activate it.
```
pip3.8 install virtualenv
virtualenv venv38
source venv38/bin/activate
```
Install pip and pip-tools.
Install the pip libraries required:
```
pip install -r requirements.txt
```

Make sure to check out the CONTRIBUTING.md about the house rules. Start developing!

Documentation

To contribute please also update the documentation. You can download the required packages from docs-requirements.txt (pip install -r docs-requirements.txt).

Install the documentation requirements:

pip install -r docs-requirements.txt

Build the documentation:

gendocs --config mkgendocs.yml

You can then do mkdocs serve and access it on http://127.0.0.1:8000/

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
docs		docs
docs_assets		docs_assets
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
docs-requirements.txt		docs-requirements.txt
mkdocs.yml		mkdocs.yml
mkgendocs.yml		mkgendocs.yml
requirements.in		requirements.in
requirements.txt		requirements.txt
slackHandler.py		slackHandler.py
test_crawler.py		test_crawler.py
webscanner.py		webscanner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScanner

How to use

Run using Docker

Use in your own project

slackHandler

Further development

Documentation

About

Releases 1

Packages

Contributors 3

Languages

F3licity/WebScanner

Folders and files

Latest commit

History

Repository files navigation

WebScanner

How to use

Run using Docker

Use in your own project

slackHandler

Further development

Documentation

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages