Scrapers

This repository contains scripts for web data scrapers written in Python and R programming languages. The language versions are python 3.7 and R 3.6.1.

General information

Summary

The world wide web is full of data that are of great interest to scientists and businesses alike. Firms, public institutions, and private users provide every imaginable type of information, and new channels of communication generate vast amounts of data on human behavior. But how to efficiently collect data from the Internet; retrieve information from social networks, search engines, and dynamic web pages; tap web services; and, finally, process the collected data with statistical software? I will answer these questions by creating effective solutions in this repository.

Result reproducibility

Please read the requirements.txt file. This file provides a listing of the necessary packages used in this repository.

Helpful commands

Execute the following commands in command prompt window

To see the list of installed python packages, > pip list
To see the list of outdated python packages: > pip list --outdated
To upgrade a particular python package: > pip install [package] --upgrade. Substitute the [package] with package name.
To automatically generate the requirements.txt file, open a terminal window in the repository and type the command, pip3 freeze > requirements.txt. See this helpful SO post on the same.
To generate the repository navigation structure, open a terminal window in the repository and type the command, tree /f. See this SO post

Repository navigation structure

├───data
├───figures
├───resources
│   └───XPATH_Tutorials
└───scripts
    ├───python
    │   └───scrapy_based_scrapers
    │       │
    │       ├───tutorial
    │       │
    │       └───web_crawl_automation
    └───R

Contact

If you'd like to contact me regarding bugs, questions, or general consulting, feel free to drop me a line at [email protected]

Donate

If this project help you reduce time to develop, you can give me a cup of coffee :)

Name		Name	Last commit message	Last commit date
Latest commit History 476 Commits
.github		.github
browser-drivers		browser-drivers
data		data
figures		figures
resources		resources
scrapy_based_scrapers		scrapy_based_scrapers
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapers.Rproj		scrapers.Rproj
scrp_init_wikipedia_links.py		scrp_init_wikipedia_links.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapers

General information

Result reproducibility

Repository navigation structure

Contact

Donate

About

Releases

Packages

Contributors 2

Languages

License

duttashi/scrapers

Folders and files

Latest commit

History

Repository files navigation

Scrapers

General information

Result reproducibility

Repository navigation structure

Contact

Donate

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages