Extract a substring from text

This is a convenience function to capture substrings from textual data. Uses str_match_all internally but instead of returning everything, always returns only one single part of the match, depending on parameters capture_n and capture_group.

extract_substring(
  string,
  pattern,
  capture_n = 1,
  capture_bracket = 0,
  missing = NA_character_
)

Arguments

string: string to extract
pattern: regular expression pattern to search for
capture_n: within each string, which match of the pattern should be extracted? e.g. if the pattern searches for words, should the first, second or third word be captured?
capture_bracket: for the captured match, which capture group should be extracted? i.e. which parentheses-enclosed segment of the pattern? by default captures the whole pattern (capture_bracket = 0).
missing: what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.

Value

character vector of same length as string with the extracted substrings

Arguments

Value

See also