Fastest way to parse regex in R
I need to parse around 1.6k REGEX expressions such as the pair I am writing below.
I have also around 7k documents (1/2 page long each in average) that need to be parsed according to the REGEX expressions.
Right now I am using
library(rebus)
library(stringr)
regex_exp - rebus::or1("(?i-mx:\\b(?:actroid\\b))", "(?i-mx:\\b(?:robot\\*w\\b)))")
regex_exp - BOUNDARY %R% regex_exp %R% BOUNDARY
stringr::str_extract_all("This is my text talking about technology, but also about the actroid", regex_exp)
to found matches, but it takes approx. 3.5 minutes per file, which is of course not scalable.
Is there a more efficient library/method to parse regex expression in R? I am also naive about whether using reticulate to parse in Python and go back to R could be faster.
Category Data Science