WebScrapper

WebScrapper is a Python-based web scraping tool that uses Selenium and BeautifulSoup to extract data from web pages. It provides a Flask API for easy integration into other projects.

Features

Scrape specific elements from web pages
Scrape entire web pages
Configurable retry mechanism for handling network issues
Dockerized for easy deployment

Requirements

Docker
Python 3.x
Flask
Selenium
BeautifulSoup
PyVirtualDisplay

Installation

Clone the repository:

git clone https://github.com/yourusername/webscrapper.git
cd webscrapper

Build the Docker image:
```
docker build -t webscrapper .
```

Usage

Start the WebScrapper service:

docker run -p 5000:5000 -e FLASK_PORT=5000 webscrapper

Use the API endpoints:

Scrape specific elements:

POST /scrape_elements
Content-Type: application/json

{
  "url": "https://example.com",
  "name": "div",
  "attrs": {"class": "example-class"},
  "except_retries": 5,
  "none_retries": 5,
  "wait_time": 3
}

Scrape whole page:

POST /scrape_whole_page
Content-Type: application/json

{
  "url": "https://example.com",
  "retries": 5,
  "wait_time": 3
}

Project Structure

Dockerfile: Defines the Docker image for the project
webscrapepr.py: Contains the main WebScrapper class
app.py: Flask application that exposes the API endpoints
.gitignore: Specifies files to be ignored by Git

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
dockerfile		dockerfile
webscrapepr.py		webscrapepr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScrapper

Features

Requirements

Installation

Usage

Project Structure

Contributing

License

About

Releases

Packages

Languages

DenisDiachkov/webscrapper

Folders and files

Latest commit

History

Repository files navigation

WebScrapper

Features

Requirements

Installation

Usage

Project Structure

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages