AI Web Scraper is a Streamlit-based application that scrapes websites using Selenium and performs intelligent content extraction. The scraper also handles captcha solving and supports custom parsing of content using AI models.
- Scrape website content using Selenium WebDriver (headless Chrome).
- Intelligent content extraction and cleaning using BeautifulSoup.
- Supports handling captchas.
- Custom content parsing using AI models (e.g., Ollama).
- Easy-to-use web interface powered by Streamlit.
- Python
- Selenium - For web scraping and captcha handling.
- BeautifulSoup - For DOM content extraction and cleaning.
- Streamlit - For the user interface.
- Docker - For containerized execution with
docker-compose
. - Ollama - For AI-based content parsing.
Make sure you have the following installed on your system:
- Docker
- Docker Compose
- Git
-
Clone the repository:
git clone https://github.com/trinhcaokhoa/AI-Web-Scraper.git cd AI-Web-Scraper
-
Build and run the project using Docker Compose:
docker-compose up --build
Open your browser and go to http://localhost:8501.