Skip to content

📦 R package to web scrap G**gle services

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ahasverus/gpack

Repository files navigation

gpack

R CMD Check Website CRAN status License: MIT

The goal of the R package gpack is to provide tools to web scraping G**gle Services (Scholar, Pictures, Trends, Search). As G**gle does not provide any API and does not allow web scraping, user public IP address can be banned. This package relies on the software OpenVPN to periodically change the IP address and the user-agent (i.e. the technical information about your system).

System requirements

Before using the package gpack you must follow these instructions:

Operating system

The package gpack has been developed only for Unix platforms (macOS and GNU/Linux). If you are on Windows, you can use Docker to start a GNU/Linux container.

Important: the package gpack must be run outside RStudio (e.g. under a terminal).

OpenVPN

The package gpack uses OpenVPN. This software is a Virtual Private Network (VPN) system. It creates secure connection to VPN server. To install this software please follows these instructions.

You also need to store your Unix user password (openvpn requires super user rights to be controlled): Under R, run the following command: usethis::edit_r_environ(). Add the following line: UNIX_PASSWD='xxx99_999xXxx'

Docker engine

The software Docker must be installed and running. The technology Selenium will be run inside a Docker container.

Selenium image

The Docker image selenium/standalone-firefox must be installed. This image contains the Selenium technology running a Firefox browser.

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ahasverus/gpack")

Then you can attach the package gpack:

library("gpack")

Overview

The package gpack provides two main function:

  • check_system(): must be run first to change the integrity of the system
  • scrap_gscholar(): get references metadata from G**gle Scholar

Citation

Please cite this package as:

Casajus N (2022) gpack: An R package to web scrap G**gle Services (Scholar, Pictures, Trends, Search). R package version 0.0.1.

Code of Conduct

Please note that the gpack project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.