This extracts words from text, by default looks for continuous sequences of numbers and/or letters. Can adjust whether characters such as "_", "-", " ", and "." should be counted as part of a word or separate them and whether numbers should be included.

extract_word(
  string,
  capture_n = 1,
  include_numbers = TRUE,
  include_underscore = FALSE,
  include_dash = FALSE,
  include_space = FALSE,
  include_colon = FALSE,
  missing = NA_character_
)

Arguments

string

string to extract

capture_n

which word to extract? 1st, 2nd, 3rd?

include_numbers

whether to include numbers (0-9) as part of the word (if FALSE, numbers will work as a word separator)

include_underscore

whether to include the underscore character (_) as part of a word (if FALSE, it will work as a word separator)

include_dash

whether to include the dash character (-) as part of a word (if FALSE, it will work as a word separator)

include_space

whether to include the space character ( ) as part of a word (if FALSE, it will work as a word separator)

include_colon

whether to include the colon character (.) as part of a word (if FALSE, it will work as a word separator)

missing

what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.

See also

Other data extraction functions: extract_data, extract_substring()

Examples

x_text <- extract_word(c("sample number16.2", "sample number7b"),
                       capture_n = 2, include_colon = TRUE)
# "number16.2" "number7b"
x_num <- parse_number(x_text)
# 16.2 7.0