This extracts words from text, by default looks for continuous sequences of numbers and/or letters. Can adjust whether characters such as "_", "-", " ", and "." should be counted as part of a word or separate them and whether numbers should be included.
extract_word(
string,
capture_n = 1,
include_numbers = TRUE,
include_underscore = FALSE,
include_dash = FALSE,
include_space = FALSE,
include_colon = FALSE,
missing = NA_character_
)
string to extract
which word to extract? 1st, 2nd, 3rd?
whether to include numbers (0-9) as part of the word (if FALSE, numbers will work as a word separator)
whether to include the underscore character (_) as part of a word (if FALSE, it will work as a word separator)
whether to include the dash character (-) as part of a word (if FALSE, it will work as a word separator)
whether to include the space character ( ) as part of a word (if FALSE, it will work as a word separator)
whether to include the colon character (.) as part of a word (if FALSE, it will work as a word separator)
what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.
Other data extraction functions:
extract_data
,
extract_substring()
x_text <- extract_word(c("sample number16.2", "sample number7b"),
capture_n = 2, include_colon = TRUE)
# "number16.2" "number7b"
x_num <- parse_number(x_text)
# 16.2 7.0