Extract words from text

This extracts words from text, by default looks for continuous sequences of numbers and/or letters. Can adjust whether characters such as "_", "-", " ", and "." should be counted as part of a word or separate them and whether numbers should be included.

extract_word(
  string,
  capture_n = 1,
  include_numbers = TRUE,
  include_underscore = FALSE,
  include_dash = FALSE,
  include_space = FALSE,
  include_colon = FALSE,
  missing = NA_character_
)

Arguments

string: string to extract
capture_n: which word to extract? 1st, 2nd, 3rd?
include_numbers: whether to include numbers (0-9) as part of the word (if FALSE, numbers will work as a word separator)
include_underscore: whether to include the underscore character (_) as part of a word (if FALSE, it will work as a word separator)
include_dash: whether to include the dash character (-) as part of a word (if FALSE, it will work as a word separator)
include_space: whether to include the space character ( ) as part of a word (if FALSE, it will work as a word separator)
include_colon: whether to include the colon character (.) as part of a word (if FALSE, it will work as a word separator)
missing: what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.

Examples

x_text <- extract_word(c("sample number16.2", "sample number7b"),
                       capture_n = 2, include_colon = TRUE)
# "number16.2" "number7b"
x_num <- parse_number(x_text)
# 16.2 7.0

Arguments

See also

Examples