Skip to content

This repo is for web data scraping. Feel free to show your ❤️ by giving a star ⭐

License

Notifications You must be signed in to change notification settings

duttashi/scrapers

Repository files navigation

Maintenance stability-experimental Issues Popularity Score Interested License

Scrapers

This repository contains scripts for web data scrapers written in Python and R programming languages. The language versions are python 3.7 and R 3.6.1.

General information

Summary

The world wide web is full of data that are of great interest to scientists and businesses alike. Firms, public institutions, and private users provide every imaginable type of information, and new channels of communication generate vast amounts of data on human behavior. But how to efficiently collect data from the Internet; retrieve information from social networks, search engines, and dynamic web pages; tap web services; and, finally, process the collected data with statistical software? I will answer these questions by creating effective solutions in this repository.

Result reproducibility

Please read the requirements.txt file. This file provides a listing of the necessary packages used in this repository.

Helpful commands

Execute the following commands in command prompt window

  • To see the list of installed python packages, > pip list
  • To see the list of outdated python packages: > pip list --outdated
  • To upgrade a particular python package: > pip install [package] --upgrade. Substitute the [package] with package name.
  • To automatically generate the requirements.txt file, open a terminal window in the repository and type the command, pip3 freeze > requirements.txt. See this helpful SO post on the same.
  • To generate the repository navigation structure, open a terminal window in the repository and type the command, tree /f. See this SO post

Repository navigation structure

├───data
├───figures
├───resources
│   └───XPATH_Tutorials
└───scripts
    ├───python
    │   └───scrapy_based_scrapers
    │       │
    │       ├───tutorial
    │       │
    │       └───web_crawl_automation
    └───R

Contact

If you'd like to contact me regarding bugs, questions, or general consulting, feel free to drop me a line at [email protected]

Donate

If this project help you reduce time to develop, you can give me a cup of coffee :)

Donate

About

This repo is for web data scraping. Feel free to show your ❤️ by giving a star ⭐

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published