The following functions are intended to make it easy to extract relevant information from textual data.
These functions are primarily intended for use in iso_mutate_file_info
and inside the filtering conditions passed to iso_filter_files
. However, they can of course also be used stand-alone and in regular mutate
or filter
calls on the data frames returned by the data retrieval functions (iso_get_raw_data
, iso_get_file_info
, iso_get_vendor_data_table
, etc.). Not that all the parse_
functions are used in iso_parse_file_info
for easy type conversions.
For simultaneous extraction of pure text data into multiple columns, please see the extract
function from the tidyr package.
extract_substring
is a generic convenience function to extract parts of textual data (based on regular expression matches).
Can be used in combination with the parsing functions to turn extracted substrings into numerical or logical data.
extract_word
is a more specific convenience function to extract the 1st/2nd/3rd word from textual data.
parse_number
is a convenience function to extract a number even if it is surrounded by text (re-exported from the readr package).
parse_double
parses text that holds double (decimal) numerical values without any extraneous text around -
use parse_number
instead if this is not the case (re-exported from the readr package)
parse_integer
parses text that holds integer (whole number) numerical values without any extraneous text around -
use parse_number
instead if this is not the case (re-exported from the readr package)
parse_logical
parses text that holds logical (boolean, i.e. TRUE/FALSE) values (re-exported from the readr package)
parse_datetime
parses text that holds date and time information (re-exported from the readr package)
Other data extraction functions:
extract_substring()
,
extract_word()