This is a convenience function to capture substrings from textual data. Uses str_match_all internally but instead of returning everything, always returns only one single part of the match, depending on parameters capture_n and capture_group.

extract_substring(
  string,
  pattern,
  capture_n = 1,
  capture_bracket = 0,
  missing = NA_character_
)

Arguments

string

string to extract

pattern

regular expression pattern to search for

capture_n

within each string, which match of the pattern should be extracted? e.g. if the pattern searches for words, should the first, second or third word be captured?

capture_bracket

for the captured match, which capture group should be extracted? i.e. which parentheses-enclosed segment of the pattern? by default captures the whole pattern (capture_bracket = 0).

missing

what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.

Value

character vector of same length as string with the extracted substrings

See also

Other data extraction functions: extract_data, extract_word()