This is a convenience function to capture substrings from textual data.
Uses str_match_all
internally but instead of returning everything, always returns only one single part of the match, depending on parameters capture_n
and capture_group
.
extract_substring(
string,
pattern,
capture_n = 1,
capture_bracket = 0,
missing = NA_character_
)
string to extract
regular expression pattern to search for
within each string, which match of the pattern
should be extracted? e.g. if the pattern searches for words, should the first, second or third word be captured?
for the captured match, which capture group should be extracted? i.e. which parentheses-enclosed segment of the pattern
?
by default captures the whole pattern (capture_bracket = 0
).
what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.
character vector of same length as string
with the extracted substrings
Other data extraction functions:
extract_data
,
extract_word()