Skip to content

d-saikrishna/WebScraping

Repository files navigation

WebScraping

All my projects and learnings on WebScraping.

This code crawls across all the districts for a given state in the Mission Antyodaya 2020 website and downloads the GramPanchayat level score cards in PDF format. Further, each of the downloaded PDFs will be read and a GP level dataset will be created.

This code automates webscraping across all the job domains - accounting, interior designing etc.; loop across all the pages in each of the job links - and finally extract information about all the job postings. Due to computing and internet limitations, the results presented are only a sample result. Nevertheless, the code is scalable and provided with good internet connection and computing power, it will extract lakhs of jobs posted on the website.

This is a simple code that webscrapes candidates details from the Election Commission website.

This code scrapes across all firms GST compliance reports on the Clear Tax website (which is in JSON format) and extracts the addresses of the firms. These addresses could be further geo-coded to get firm density in states etc.

About

All my projects and learnings on WebScraping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published