Wiki Movies Crawler

This project is based on BeautifulSoup BeautifulSoup,Pandas which crawl the data from Wikipedia Movies List and gives data as API response Currently this system take ~1 min to crawl whole movies data.

Background

For Machine learing or deep learning model dataset is crucial point achieving good accuracy. So customized dataset is needed for almost every AI related task. Wikipedia is a great source of data. this project is like pipeline for data preparation machine learning model

Installation

Python >= 3.6
Dependencies: pip install -r requirements.txt

Usage

For Parsing

 python application.py -i parse

For Getting API response

python application.py -i serve

For Particular movie detail,paste url in browser

 http://localhost:8000/movie/123/

For 10 movies list ,paste in browser

localhost:8000/movies/count=10/page_size=100/page_no=1

Here page_size means, i want to chunk the db in 100 page and page_no means 10 movies list from page 1

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
movie_api		movie_api
src		src
2.7.5		2.7.5
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
application.py		application.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
web-variables.env		web-variables.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wiki Movies Crawler

Table of Contents

Background

Installation

Usage

About

Releases

Packages

Languages

forhadsidhu/Wiki-Movies-Crawler

Folders and files

Latest commit

History

Repository files navigation

Wiki Movies Crawler

Table of Contents

Background

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages