Parsers for Digital Object Identifiers (DOIs) and Other Identifiers
Uses the R pkg piton which gives access to the C++ PEG implementation PEGTL.
- https://en.wikipedia.org/wiki/Digital_object_identifier
- https://www.doi.org/overview/DOI_article_ELIS3.pdf
Capture any letter
struct name
: plus< alpha >
{};
Capture any digit
struct numbers
: plus< digit >
{};
Rules are combined to form a grammar,
e.g., string must match name, then have one comma, then one space,
then match numbers.
struct grammar
: must< name, one< ',' >, space, numbers, eof >
{};
Which is then applied to parsing user input strings
- pid_dois
- pid_dois_prefixes
- pid_dois_split
- pid_dois_suffixes
devtools::install_github("ropenscilabs/parseids")library("parseids")pid_dois("Foo 10.1094/PHYTO-04-17-0144-R")
#> [1] "10.1094/PHYTO-04-17-0144-R"pid_dois(c("Foo 10.1094/PHYTO-04-17-0144-R", "adsfljadfa dflj fjas fljasf 10.1094/PHYTO-04-17-0144-R"))
#> [1] "10.1094/PHYTO-04-17-0144-R" "10.1094/PHYTO-04-17-0144-R"pid_dois_prefixes(c("10.1094/PHYTO-04-17-0144-R", "10.5150/cmcm.2011.086"))
#> [1] "10.1094" "10.5150"pid_dois_suffixes(c("10.1094/PHYTO-04-17-0144-R", "10.5150/cmcm.2011.086"))
#> [1] "PHYTO-04-17-0144-R" "cmcm.2011.086"dois_long <- unlist(replicate(100, dois, simplify = FALSE), TRUE)
length(dois_long)
#> [1] 100000library(microbenchmark)
microbenchmark::microbenchmark(
pid_dois = pid_dois(dois_long),
prefixes = pid_dois_prefixes(dois_long),
suffixes = pid_dois_suffixes(dois_long),
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> pid_dois 356.9053 367.39554 378.38806 373.42769 383.85699 463.5051 100
#> prefixes 85.5466 86.98456 91.13909 88.44211 92.90492 137.7983 100
#> suffixes 157.2990 162.26911 170.07435 167.44599 172.70201 217.5943 100- Please report any issues or bugs.
- License: MIT
- Get citation information for
parseids:citation(package = 'parseids') - Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
