Skip to content

mvkorpel/pickURL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pickURL

Travis-CI Build Status AppVeyor Build Status Coverage Status

Extract URLs and email addresses from text using R. Actually, all kinds of URIs are supported, not just URLs. The set of accepted URI schemes can easily be adjusted.

Leading and trailing punctuation is examined. If it seems that punctuation is used as delimiters around a URI or that a URI is the last part of sentence, some trailing punctuation may be removed. Comma-separated URI lists are split but the heuristics used for this may fail, as the comma is a valid character in some parts of a URI. Any technically valid URI is protected from being cut if it is surrounded by angle brackets (<http://www.example.org/>) or double quotes ("http://www.example.org/"). Whitespace is allowed (and removed) within angle brackets, as long as the URI scheme and the following : are not interrupted by whitespace.

Some (approximate) validation against the URI specification is performed, for example in the host part of the URI. The program also catches illegal ASCII characters and use of the % character for purposes other than percent-encoding; anything after that, including the illegal character itself, is not considered a part of the URL. The program is generally not aware of possible additional rules applying to URIs following a particular URI scheme. As an exception, the program knows about the structure of mailto URIs.

Installation

With devtools already installed, run the following command in the R console:

devtools::install_github("mvkorpel/pickURL")

Usage

After installing the package, see the help page of function pick_urls.

About

Extract URLs and email addresses from text using R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages