Skip to content

Parse identifiers (e.g., DOIs)

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

sckott/parseids

Repository files navigation

parseids

Build Status

Parsers for Digital Object Identifiers (DOIs) and Other Identifiers

Uses the R pkg piton which gives access to the C++ PEG implementation PEGTL.

Documentation for various identifiers

DOI

Example rules

Capture any letter

struct name
  : plus< alpha >
{};

Capture any digit

struct numbers
  : plus< digit >
{};

Grammar

Rules are combined to form a grammar,

e.g., string must match name, then have one comma, then one space, then match numbers.

struct grammar
  : must< name, one< ',' >, space, numbers, eof >
{};

Which is then applied to parsing user input strings

parseids API

  • pid_dois
  • pid_dois_prefixes
  • pid_dois_split
  • pid_dois_suffixes

Install

devtools::install_github("ropenscilabs/parseids")
library("parseids")

pull out DOIs from text strings

pid_dois("Foo 10.1094/PHYTO-04-17-0144-R")
#> [1] "10.1094/PHYTO-04-17-0144-R"
pid_dois(c("Foo 10.1094/PHYTO-04-17-0144-R", "adsfljadfa dflj fjas fljasf 10.1094/PHYTO-04-17-0144-R"))
#> [1] "10.1094/PHYTO-04-17-0144-R" "10.1094/PHYTO-04-17-0144-R"

DOI prefixes

pid_dois_prefixes(c("10.1094/PHYTO-04-17-0144-R", "10.5150/cmcm.2011.086"))
#> [1] "10.1094" "10.5150"

DOI suffixes

pid_dois_suffixes(c("10.1094/PHYTO-04-17-0144-R", "10.5150/cmcm.2011.086"))
#> [1] "PHYTO-04-17-0144-R" "cmcm.2011.086"

timing

dois_long <- unlist(replicate(100, dois, simplify = FALSE), TRUE)
length(dois_long)
#> [1] 100000
library(microbenchmark)
microbenchmark::microbenchmark(
  pid_dois = pid_dois(dois_long),
  prefixes = pid_dois_prefixes(dois_long),
  suffixes = pid_dois_suffixes(dois_long),
  times = 100
)
#> Unit: milliseconds
#>      expr      min        lq      mean    median        uq      max neval
#>  pid_dois 356.9053 367.39554 378.38806 373.42769 383.85699 463.5051   100
#>  prefixes  85.5466  86.98456  91.13909  88.44211  92.90492 137.7983   100
#>  suffixes 157.2990 162.26911 170.07435 167.44599 172.70201 217.5943   100

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for parseids: citation(package = 'parseids')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

rofooter

Releases

No releases published

Packages

No packages published