This project is based on BeautifulSoup BeautifulSoup,Pandas which crawl the data from Wikipedia Movies List and gives data as API response Currently this system take ~1 min to crawl whole movies data.
For Machine learing or deep learning model dataset is crucial point achieving good accuracy. So customized dataset is needed for almost every AI related task. Wikipedia is a great source of data. this project is like pipeline for data preparation machine learning model
- Python >= 3.6
- Dependencies:
pip install -r requirements.txt
For Parsing
python application.py -i parse
For Getting API response
python application.py -i serve
For Particular movie detail,paste url in browser
http://localhost:8000/movie/123/
For 10 movies list ,paste in browser
localhost:8000/movies/count=10/page_size=100/page_no=1
Here page_size means, i want to chunk the db in 100 page and page_no means 10 movies list from page 1