The goal of the R package gpack
is to provide tools to web scraping
G**gle Services (Scholar, Pictures, Trends, Search). As G**gle does
not provide any API and does not allow web scraping, user public IP
address can be banned. This package relies on the software OpenVPN to
periodically change the IP address and the user-agent (i.e. the
technical information about your system).
Before using the package gpack
you must follow these instructions:
The package gpack
has been developed only for Unix platforms
(macOS and GNU/Linux). If you are on Windows, you can use Docker to
start a GNU/Linux container.
Important: the package gpack
must be run outside RStudio
(e.g. under a terminal).
The package gpack
uses OpenVPN. This
software is a Virtual Private Network (VPN) system. It creates secure
connection to VPN server. To install this software please follows these
instructions.
You also need to store your Unix user password (openvpn
requires super
user rights to be controlled): Under R, run the following command:
usethis::edit_r_environ()
. Add the following line:
UNIX_PASSWD='xxx99_999xXxx'
The software Docker must be installed and running. The technology Selenium will be run inside a Docker container.
The Docker image
selenium/standalone-firefox
must be installed. This image contains the Selenium technology running a
Firefox browser.
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("ahasverus/gpack")
Then you can attach the package gpack
:
library("gpack")
The package gpack
provides two main function:
check_system()
: must be run first to change the integrity of the systemscrap_gscholar()
: get references metadata from G**gle Scholar
Please cite this package as:
Casajus N (2022) gpack: An R package to web scrap G**gle Services (Scholar, Pictures, Trends, Search). R package version 0.0.1.
Please note that the gpack
project is released with a Contributor
Code of
Conduct.
By contributing to this project, you agree to abide by its terms.