Isoreader provides a number of general purpose operations that work on all supported IRMS data formats such as caching of files, parallel processing and catching read errors. This vignette demonstrates some of these general operations.
# list all suported file types
iso_get_supported_file_types() |>
dplyr::select(extension, software, description, type) |>
knitr::kable()
extension | software | description | type |
---|---|---|---|
.cf | Isodat | Continuous Flow file format (older) | continuous flow |
.cf.rds | isoreader | R Data Storage | continuous flow |
.dxf | Isodat | Continuous Flow file format (newer) | continuous flow |
.iarc | ionOS | Continuous Flow data archive | continuous flow |
.caf | Isodat | Dual Inlet file format (older) | dual inlet |
.di.rds | isoreader | R Data Storage | dual inlet |
.did | Isodat | Dual Inlet file format (newer) | dual inlet |
.txt | Nu | Dual Inlet file format | dual inlet |
.scan.rds | isoreader | R Data Storage | scan |
.scn | Isodat | Scan file format | scan |
By default, isoreader is quite verbose to let the user know what is
happening. However, most functions can be silenced by adding the
parameter quiet = TRUE
to the function call. This can also
be done globally using iso_turn_info_messages_off()
# read a file in the default verbose mode
iso_get_reader_example("dual_inlet_example.did") |>
iso_read_dual_inlet() |>
iso_select_file_info(file_datetime, `Identifier 1`) |>
iso_get_file_info() |>
knitr::kable()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'dual_inlet_example.did' from cache...
#> Info: finished reading 1 files in 0.26 secs
#> Info: selecting/renaming the following file info across 1 data file(s): 'file_datetime', 'Identifier 1'
#> Info: aggregating file info from 1 data file(s)
file_id | file_datetime | Identifier 1 |
---|---|---|
dual_inlet_example.did | 2014-10-27 11:23:54 | CIT Carrara |
# read the same file but make the read process quiet
iso_get_reader_example("dual_inlet_example.did") |>
iso_read_dual_inlet(quiet = TRUE) |>
iso_select_file_info(file_datetime, `Identifier 1`) |>
iso_get_file_info() |>
knitr::kable()
#> Info: selecting/renaming the following file info across 1 data file(s): 'file_datetime', 'Identifier 1'
#> Info: aggregating file info from 1 data file(s)
file_id | file_datetime | Identifier 1 |
---|---|---|
dual_inlet_example.did | 2014-10-27 11:23:54 | CIT Carrara |
# read the same file but turn all isoreader messages off
iso_turn_info_messages_off()
iso_get_reader_example("dual_inlet_example.did") |>
iso_read_dual_inlet(quiet = TRUE) |>
iso_select_file_info(file_datetime, `Identifier 1`) |>
iso_get_file_info() |>
knitr::kable()
file_id | file_datetime | Identifier 1 |
---|---|---|
dual_inlet_example.did | 2014-10-27 11:23:54 | CIT Carrara |
# turn message back on
iso_turn_info_messages_on()
#> Info: information messages turned on
By default, isoreader caches files as R objects to make access faster in the future. This feature can be turned off if you want to force a fresh read from the source file. Alternatively, you can clear the entire isoreader cache in your working directory to clean up previous file reads.
# cleanup reader cache
iso_cleanup_reader_cache()
#> Info: removed all (0) cached isoreader files.
# read a new file (notice the time elapsed)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
iso_read_continuous_flow()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'continuous_flow_example.dxf' from cache...
#> Info: finished reading 1 files in 0.13 secs
# re-read the same file much faster (it will be read from cache)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
iso_read_continuous_flow()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'continuous_flow_example.dxf' from cache...
#> Info: finished reading 1 files in 0.13 secs
# turn reader caching off
iso_turn_reader_caching_off()
#> Info: caching turned off
# re-read the same file (it will NOT be read from cache)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
iso_read_continuous_flow()
#> Info: preparing to read 1 data files...
#> Info: reading file 'continuous_flow_example.dxf' with '.dxf' reader...
#> Info: finished reading 1 files in 1.84 secs
# turn reader caching back on
iso_turn_reader_caching_on()
#> Info: caching turned on
Isoreader supports parallel processing of data files based on the
number of processors available in a computer simply by setting the
parallel = TRUE
flag in any file read operation. This makes
it possible to read large quantities of data files much more quickly on
a multi-core system (i.e. most modern laptops).
However, whether parallel processing yields significant improvements in read speeds depends on the number of available processors, file types and operating system. In theory, parallel processing always reduces computation time but in practice this is offset by various factors including the size of the data that needs to be sent back and forth between the processors, file system read/write speed, and the spin-up time for new processes. Generally speaking, parallel processing can provide significant improvements in speed with larger number of files (~10+) and more complex read operations (e.g. continuous flow > dual inlet > scan file). Reading from cache is so efficient that there are rarely gains from parallel processing and it is usually faster NOT to read in parallel once a set of files is already cached.
# read 3 files in parallel (note that this is usually not a large enough file number to be worth it)
di_files <-
iso_read_dual_inlet(
iso_get_reader_example("dual_inlet_example.did"),
iso_get_reader_example("dual_inlet_example.caf"),
iso_get_reader_example("dual_inlet_nu_example.txt"),
nu_masses = 49:44,
parallel = TRUE
)
#> Info: preparing to read 3 data files (all will be cached), setting up 2 par...
#> Info (process 1): reading file 'dual_inlet_example.did' from cache...
#> Info (process 1): reading file 'dual_inlet_nu_example.txt' from cache...
#> Info (process 2): reading file 'dual_inlet_example.caf' from cache...
#> Info: finished reading 3 files in 2.31 secs
All isoreader objects are lists that can be combined or subset to work with only specific files or create a larger collection.
# all 3 di_files read above
di_files
#> Data from 3 dual inlet iso files:
#> # A tibble: 3 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inle… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inle… NA 8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA 82 cycl… 9 entries no method …
# only one of the files (by index)
di_files[[2]]
#> Dual inlet iso file 'dual_inlet_example.caf': 8 cycles, 6 ions (44,45,46,47,48,49)
# only one of the files (by file_id)
di_files$dual_inlet_example.did
#> Dual inlet iso file 'dual_inlet_example.did': 7 cycles, 6 ions (44,45,46,47,48,49)
# a subset of the files (by index)
di_files[c(1,3)]
#> Data from 2 dual inlet iso files:
#> # A tibble: 2 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inle… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_nu_example.… dual_inle… NA 82 cycl… 9 entries no method …
# a subset of the files (by file_id)
di_files[c("dual_inlet_example.did", "dual_inlet_example.caf")]
#> Data from 2 dual inlet iso files:
#> # A tibble: 2 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inlet… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inlet… NA 8 cycle… 22 entri… standards,…
# same result using iso_filter_files (more flexible + verbose output)
di_files |> iso_filter_files(
file_id %in% c("dual_inlet_example.did", "dual_inlet_example.caf")
)
#> Info: applying file filter, keeping 2 of 3 files
#> Data from 2 dual inlet iso files:
#> # A tibble: 2 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inlet… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inlet… NA 8 cycle… 22 entri… standards,…
# recombining subset files
c(
di_files[3],
di_files[1]
)
#> Data from 2 dual inlet iso files:
#> # A tibble: 2 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_nu_example.… dual_inle… NA 82 cycl… 9 entries no method …
#> 2 dual_inlet_example.did dual_inle… NA 7 cycle… 16 entri… standards,…
Isoreader is designed to catch problems during file reading without crashing the read pipeline. It keeps track of all problems encountered along the way to make it easy to see what went wrong and remove erroneous files. Most times, files that were only partly saved because of an interrupted instrument analysis will have errors. If you encounter a file that should have intact data in it but has an error in isoreader, please file a bug report and submit your file at https://github.com/isoverse/isoreader/issues
# read two files, one of which is erroneous
iso_files <-
iso_read_continuous_flow(
iso_get_reader_example("continuous_flow_example.dxf"),
system.file("errdata", "cf_without_data.dxf", package = "isoreader")
)
#> Info: preparing to read 2 data files (all will be cached)...
#> Info: reading file 'extdata/continuous_flow_example.dxf' from cache...
#> Info: reading file 'errdata/cf_without_data.dxf' with '.dxf' reader...
#> Warning: caught error - cannot identify measured masses - block 'CEvalDataI...
#> Info: finished reading 2 files in 0.61 secs
#> Warning: encountered 1 problem.
#> # | FILE | PROBLEM | OCCURRED IN | DETAILS
#> 1 | cf_without_data.dxf | error | extract_dxf_raw_voltage_data | cannot ide...
#> Use iso_get_problems(...) for more details.
# retrieve problem summary
iso_files |> iso_get_problems_summary() |> knitr::kable()
file_id | error | warning |
---|---|---|
cf_without_data.dxf | 1 | 0 |
# retrieve problem details
iso_files |> iso_get_problems() |> knitr::kable()
file_id | type | func | details |
---|---|---|---|
cf_without_data.dxf | error | extract_dxf_raw_voltage_data | cannot identify measured masses - block ‘CEvalDataIntTransferPart’ not found after position 1 (nav block#1 ‘CFileHeader’, pos 65327, max 119237) |
# filter out erroneous files
iso_files <- iso_files |> iso_filter_files_with_problems()
#> Info: removing 1/2 files that have any error (keeping 1)
If a file has changed (e.g. is edited through the vendor software)
and the changes should be loaded in isoreader, it is easy to re-read and
update just those files within a file collection by using the
iso_reread_changed_files()
function. If some of the files
are no longer accessible at their original location, it will throw a
warning. If the location for all files has changed, it can be easily
adjusted by modifying the file_root
file info parameter
using iso_set_file_root()
.
Similar functions can be used to re-read outdated files from an older
isoreader version (iso_reread_outdated_files()
), attempt to
re-read problematic files that had read errors/warnings
(iso_reread_problem_files()
), or simply re-read all files
in a collection (iso_reread_all_files()
).
# re-read the 3 dual inlet files from their original location if any have changed
di_files |>
iso_reread_changed_files()
#> Info: found 0 changed data file(s), re-reading 0/3.
#> Data from 3 dual inlet iso files:
#> # A tibble: 3 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inle… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inle… NA 8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA 82 cycl… 9 entries no method …
# update the file_root for the files before re-read (in this case to a location
# that does not hold these files and hence will lead to a warning)
di_files |>
iso_set_file_root(root = ".") |>
iso_reread_all_files()
#> Info: setting file root for 3 data file(s) to '.'
#> Warning: 3 file(s) do not exist at their referenced location and can not be re-read. Consider setting a new root directory with iso_set_file_root() first:
#> - 'dual_inlet_example.did' in root '.'
#> - 'dual_inlet_example.caf' in root '.'
#> - 'dual_inlet_nu_example.txt' in root '.'
#> Info: found 0 data file(s), re-reading 0/3.
#> Data from 3 dual inlet iso files:
#> # A tibble: 3 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inle… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inle… NA 8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA 82 cycl… 9 entries no method …
#>
#> Problem summary:
#> # A tibble: 3 × 3
#> file_id warning error
#> <chr> <int> <int>
#> 1 dual_inlet_example.caf 1 0
#> 2 dual_inlet_example.did 1 0
#> 3 dual_inlet_nu_example.txt 1 0
Isoreader provides a built in data type with units
(iso_with_units
) that can be used to easily keep track of
units inside data frame. These units can be made explicit (=included in
the column header), stripped altogether, or turned back to be
implicit.
# strip all units
cf_file |>
iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`)) |>
iso_strip_units() |> head(3)
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 4
#> file_id `Ampl 28` `rIntensity 28` `d 15N/14N`
#> <chr> <dbl> <dbl> <dbl>
#> 1 continuous_flow_example.dxf 3024. 57524. 0.0160
#> 2 continuous_flow_example.dxf 3023. 57383. 0
#> 3 continuous_flow_example.dxf 2074. 52732. 1.05
# make units explicit
cf_file |>
iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`)) |>
iso_make_units_explicit() |> head(3)
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 4
#> file_id `Ampl 28 [mV]` `rIntensity 28 [mVs]` `d 15N/14N [permil]`
#> <chr> <dbl> <dbl> <dbl>
#> 1 continuous_flow_exa… 3024. 57524. 0.0160
#> 2 continuous_flow_exa… 3023. 57383. 0
#> 3 continuous_flow_exa… 2074. 52732. 1.05
# introduce new unit columns e.g. in the file info
cf_file |>
iso_mutate_file_info(weight = iso_with_units(0.42, "mg")) |>
iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`),
include_file_info = weight) |>
iso_make_units_explicit() |> head(3)
#> Info: mutating file info for 1 data file(s)
#> Info: aggregating vendor data table from 1 data file(s), including file info 'weight'
#> # A tibble: 3 × 5
#> file_id `weight [mg]` `Ampl 28 [mV]` `rIntensity 28 [mVs]`
#> <chr> <dbl> <dbl> <dbl>
#> 1 continuous_flow_example.dxf 0.42 3024. 57524.
#> 2 continuous_flow_example.dxf 0.42 3023. 57383.
#> 3 continuous_flow_example.dxf 0.42 2074. 52732.
#> # ℹ 1 more variable: `d 15N/14N [permil]` <dbl>
# or turn a column e.g. with custom format units in the header into implicit units
cf_file |>
iso_mutate_file_info(weight.mg = 0.42) |>
iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`),
include_file_info = weight.mg) |>
iso_make_units_implicit(prefix = ".", suffix = "") |> head(3)
#> Info: mutating file info for 1 data file(s)
#> Info: aggregating vendor data table from 1 data file(s), including file info 'weight.mg'
#> # A tibble: 3 × 5
#> file_id weight `Ampl 28` `rIntensity 28` `d 15N/14N`
#> <chr> <dbl[mg]> <dbl[mV]> <dbl[mVs]> <dbl[permil]>
#> 1 continuous_flow_example.dxf 0.42 3024.040 57524.32 0.01600287
#> 2 continuous_flow_example.dxf 0.42 3022.789 57383.21 0.00000000
#> 3 continuous_flow_example.dxf 0.42 2073.872 52731.67 1.04910065
Formatting data into text is easily achieved with the built in R
function sprintf
but this package also provides a
convenience function that knows how to incorporate units information
from iso_with_units
values. Use iso_format
to
format and concatenate any single values or entire columns inside a data
frame.
# concatenation example with single values
iso_format(
pi = 3.14159,
x = iso_with_units(42, "mg"),
ID = "ABC",
signif = 4,
sep = " | "
)
#> [1] "pi: 3.142 | x: 42mg | ID: ABC"
# example inside a data frame
cf_file |>
iso_get_vendor_data_table(select = c(`Nr.`, `Ampl 28`, `d 15N/14N`)) |>
dplyr::select(-file_id) |>
head(3) |>
# introduce new label columns using iso_format
dplyr::mutate(
# default concatenation of values
label_default = iso_format(
`Nr.`, `Ampl 28`, `d 15N/14N`,
sep = ", "
),
# concatenate with custom names for each value
label_named = iso_format(
`#` = `Nr.`, A = `Ampl 28`, d15 = `d 15N/14N`,
sep = ", "
),
# concatenate just the values and increase significant digits
label_value = iso_format(
`Nr.`, `Ampl 28`, `d 15N/14N`,
sep = ", ", format_names = NULL, signif = 6
)
)
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 6
#> Nr. `Ampl 28` `d 15N/14N` label_default label_named label_value
#> <int> <dbl[mV]> <dbl[permil]> <chr> <chr> <chr>
#> 1 1 3024.040 0.01600287 Nr.: 1, Ampl 28: 3020mV… #: 1, A: 3… 1, 3024.04…
#> 2 2 3022.789 0.00000000 Nr.: 2, Ampl 28: 3020mV… #: 2, A: 3… 2, 3022.79…
#> 3 3 2073.872 1.04910065 Nr.: 3, Ampl 28: 2070mV… #: 3, A: 2… 3, 2073.87…