Web Scraping Project - with Automated Googlesheets Upload

.
├── Makefile
├── README.md
├── config
│   ├── config.yaml
│   └── google-credentials.json ("Not Available. Create your own if you want to export to Googlesheets")  
├── data
│   └── data.csv ("Not Available as file is too big. This is where you can store the data locally in this repo.")
├── notebooks
│   └── web_scraping.ipynb
└── scripts
    ├── gebiz_scraping.py
    └── googlesheets.py

The aim of this project is to use selenium to interact with websites who have require clicking and navigations that are hidden due to AJAX or Javascript.
However, after the navigation is complete, we will continue the function with beautifulsoup4 to do the actual parsing of the website.

Scrape website using given example

make scrape

Upload using googlesheets.py

Simply follow instructions online to download credentials: https://gspread.readthedocs.io/en/latest/oauth2.html
Run the script googlesheets.py afterwards.

Deployment

This is script can be uploaded to an automation orchestration framework such as Airflow on Kubernetes to be run periodically.
It is also possible to allow the automated upload of the csv to some user friendly source such as Google Sheets.

Note: if you want to witness the interaction so as to debug, comment out the headless option. However, to use the headless option properly, please uncomment out the display specifics in the next line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Project - with Automated Googlesheets Upload

Scrape website using given example

Upload using googlesheets.py

Deployment

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

jindongyang94/web-scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project - with Automated Googlesheets Upload

Scrape website using given example

Upload using googlesheets.py

Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages