Operations

Introduction

Isoreader provides a number of general purpose operations that work on all supported IRMS data formats such as caching of files, parallel processing and catching read errors. This vignette demonstrates some of these general operations.

# load isoreader package
library(isoreader)

Supported file types

# list all suported file types
iso_get_supported_file_types() |>
  dplyr::select(extension, software, description, type) |>
  knitr::kable()

extension	software	description	type
.cf	Isodat	Continuous Flow file format (older)	continuous flow
.cf.rds	isoreader	R Data Storage	continuous flow
.dxf	Isodat	Continuous Flow file format (newer)	continuous flow
.iarc	ionOS	Continuous Flow data archive	continuous flow
.caf	Isodat	Dual Inlet file format (older)	dual inlet
.di.rds	isoreader	R Data Storage	dual inlet
.did	Isodat	Dual Inlet file format (newer)	dual inlet
.txt	Nu	Dual Inlet file format	dual inlet
.scan.rds	isoreader	R Data Storage	scan
.scn	Isodat	Scan file format	scan

Messages

By default, isoreader is quite verbose to let the user know what is happening. However, most functions can be silenced by adding the parameter quiet = TRUE to the function call. This can also be done globally using iso_turn_info_messages_off()

# read a file in the default verbose mode
iso_get_reader_example("dual_inlet_example.did") |>
  iso_read_dual_inlet() |>
  iso_select_file_info(file_datetime, `Identifier 1`) |>
  iso_get_file_info() |>
  knitr::kable()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'dual_inlet_example.did' from cache...
#> Info: finished reading 1 files in 0.26 secs
#> Info: selecting/renaming the following file info across 1 data file(s): 'file_datetime', 'Identifier 1'
#> Info: aggregating file info from 1 data file(s)

file_id	file_datetime	Identifier 1
dual_inlet_example.did	2014-10-27 11:23:54	CIT Carrara


# read the same file but make the read process quiet
iso_get_reader_example("dual_inlet_example.did") |>
  iso_read_dual_inlet(quiet = TRUE) |>
  iso_select_file_info(file_datetime, `Identifier 1`) |>
  iso_get_file_info() |>
  knitr::kable()
#> Info: selecting/renaming the following file info across 1 data file(s): 'file_datetime', 'Identifier 1'
#> Info: aggregating file info from 1 data file(s)

file_id	file_datetime	Identifier 1
dual_inlet_example.did	2014-10-27 11:23:54	CIT Carrara


# read the same file but turn all isoreader messages off
iso_turn_info_messages_off()
iso_get_reader_example("dual_inlet_example.did") |>
  iso_read_dual_inlet(quiet = TRUE) |>
  iso_select_file_info(file_datetime, `Identifier 1`) |>
  iso_get_file_info() |>
  knitr::kable()

file_id	file_datetime	Identifier 1
dual_inlet_example.did	2014-10-27 11:23:54	CIT Carrara


# turn message back on
iso_turn_info_messages_on()
#> Info: information messages turned on

Caching

By default, isoreader caches files as R objects to make access faster in the future. This feature can be turned off if you want to force a fresh read from the source file. Alternatively, you can clear the entire isoreader cache in your working directory to clean up previous file reads.

# cleanup reader cache
iso_cleanup_reader_cache()
#> Info: removed all (0) cached isoreader files.

# read a new file (notice the time elapsed)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
  iso_read_continuous_flow()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'continuous_flow_example.dxf' from cache...
#> Info: finished reading 1 files in 0.13 secs

# re-read the same file much faster (it will be read from cache)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
    iso_read_continuous_flow()
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'continuous_flow_example.dxf' from cache...
#> Info: finished reading 1 files in 0.13 secs

# turn reader caching off
iso_turn_reader_caching_off()
#> Info: caching turned off

# re-read the same file (it will NOT be read from cache)
cf_file <- iso_get_reader_example("continuous_flow_example.dxf") |>
  iso_read_continuous_flow()
#> Info: preparing to read 1 data files...
#> Info: reading file 'continuous_flow_example.dxf' with '.dxf' reader...
#> Info: finished reading 1 files in 1.84 secs

# turn reader caching back on
iso_turn_reader_caching_on()
#> Info: caching turned on

Parallel processing

Isoreader supports parallel processing of data files based on the number of processors available in a computer simply by setting the parallel = TRUE flag in any file read operation. This makes it possible to read large quantities of data files much more quickly on a multi-core system (i.e. most modern laptops).

However, whether parallel processing yields significant improvements in read speeds depends on the number of available processors, file types and operating system. In theory, parallel processing always reduces computation time but in practice this is offset by various factors including the size of the data that needs to be sent back and forth between the processors, file system read/write speed, and the spin-up time for new processes. Generally speaking, parallel processing can provide significant improvements in speed with larger number of files (~10+) and more complex read operations (e.g. continuous flow > dual inlet > scan file). Reading from cache is so efficient that there are rarely gains from parallel processing and it is usually faster NOT to read in parallel once a set of files is already cached.

# read 3 files in parallel (note that this is usually not a large enough file number to be worth it)
di_files <-
  iso_read_dual_inlet(
    iso_get_reader_example("dual_inlet_example.did"),
    iso_get_reader_example("dual_inlet_example.caf"),
    iso_get_reader_example("dual_inlet_nu_example.txt"),
    nu_masses = 49:44,
    parallel = TRUE
  )
#> Info: preparing to read 3 data files (all will be cached), setting up 2 par...
#> Info (process 1): reading file 'dual_inlet_example.did' from cache...
#> Info (process 1): reading file 'dual_inlet_nu_example.txt' from cache...
#> Info (process 2): reading file 'dual_inlet_example.caf' from cache...
#> Info: finished reading 3 files in 2.31 secs

Combining / subsetting isofiles

All isoreader objects are lists that can be combined or subset to work with only specific files or create a larger collection.

# all 3 di_files read above
di_files
#> Data from 3 dual inlet iso files: 
#> # A tibble: 3 × 6
#>   file_id                 file_path_ file_subpath raw_data file_info method_info
#>   <chr>                   <chr>      <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did  dual_inle… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf  dual_inle… NA           8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA           82 cycl… 9 entries no method …

# only one of the files (by index)
di_files[[2]]
#> Dual inlet iso file 'dual_inlet_example.caf': 8 cycles, 6 ions (44,45,46,47,48,49)

# only one of the files (by file_id)
di_files$dual_inlet_example.did
#> Dual inlet iso file 'dual_inlet_example.did': 7 cycles, 6 ions (44,45,46,47,48,49)

# a subset of the files (by index)
di_files[c(1,3)]
#> Data from 2 dual inlet iso files: 
#> # A tibble: 2 × 6
#>   file_id                 file_path_ file_subpath raw_data file_info method_info
#>   <chr>                   <chr>      <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did  dual_inle… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_nu_example.… dual_inle… NA           82 cycl… 9 entries no method …

# a subset of the files (by file_id)
di_files[c("dual_inlet_example.did", "dual_inlet_example.caf")]
#> Data from 2 dual inlet iso files: 
#> # A tibble: 2 × 6
#>   file_id                file_path_  file_subpath raw_data file_info method_info
#>   <chr>                  <chr>       <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did dual_inlet… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inlet… NA           8 cycle… 22 entri… standards,…

# same result using iso_filter_files (more flexible + verbose output)
di_files |> iso_filter_files(
  file_id %in% c("dual_inlet_example.did", "dual_inlet_example.caf")
)
#> Info: applying file filter, keeping 2 of 3 files
#> Data from 2 dual inlet iso files: 
#> # A tibble: 2 × 6
#>   file_id                file_path_  file_subpath raw_data file_info method_info
#>   <chr>                  <chr>       <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did dual_inlet… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inlet… NA           8 cycle… 22 entri… standards,…

# recombining subset files
c(
  di_files[3],
  di_files[1]
)
#> Data from 2 dual inlet iso files: 
#> # A tibble: 2 × 6
#>   file_id                 file_path_ file_subpath raw_data file_info method_info
#>   <chr>                   <chr>      <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_nu_example.… dual_inle… NA           82 cycl… 9 entries no method …
#> 2 dual_inlet_example.did  dual_inle… NA           7 cycle… 16 entri… standards,…

Dealing with file read problems

Isoreader is designed to catch problems during file reading without crashing the read pipeline. It keeps track of all problems encountered along the way to make it easy to see what went wrong and remove erroneous files. Most times, files that were only partly saved because of an interrupted instrument analysis will have errors. If you encounter a file that should have intact data in it but has an error in isoreader, please file a bug report and submit your file at https://github.com/isoverse/isoreader/issues

# read two files, one of which is erroneous
iso_files <-
  iso_read_continuous_flow(
    iso_get_reader_example("continuous_flow_example.dxf"),
    system.file("errdata", "cf_without_data.dxf", package = "isoreader")
  )
#> Info: preparing to read 2 data files (all will be cached)...
#> Info: reading file 'extdata/continuous_flow_example.dxf' from cache...
#> Info: reading file 'errdata/cf_without_data.dxf' with '.dxf' reader...
#> Warning: caught error - cannot identify measured masses - block 'CEvalDataI...
#> Info: finished reading 2 files in 0.61 secs
#> Warning: encountered 1 problem.
#> # | FILE                | PROBLEM | OCCURRED IN                  | DETAILS
#> 1 | cf_without_data.dxf | error   | extract_dxf_raw_voltage_data | cannot ide...
#> Use iso_get_problems(...) for more details.

# retrieve problem summary
iso_files |> iso_get_problems_summary() |> knitr::kable()

file_id	error	warning
cf_without_data.dxf	1	0


# retrieve problem details
iso_files |> iso_get_problems() |> knitr::kable()

file_id	type	func	details
cf_without_data.dxf	error	extract_dxf_raw_voltage_data	cannot identify measured masses - block ‘CEvalDataIntTransferPart’ not found after position 1 (nav block#1 ‘CFileHeader’, pos 65327, max 119237)


# filter out erroneous files
iso_files <- iso_files |> iso_filter_files_with_problems()
#> Info: removing 1/2 files that have any error (keeping 1)

Re-reading files

If a file has changed (e.g. is edited through the vendor software) and the changes should be loaded in isoreader, it is easy to re-read and update just those files within a file collection by using the iso_reread_changed_files() function. If some of the files are no longer accessible at their original location, it will throw a warning. If the location for all files has changed, it can be easily adjusted by modifying the file_root file info parameter using iso_set_file_root().

Similar functions can be used to re-read outdated files from an older isoreader version (iso_reread_outdated_files()), attempt to re-read problematic files that had read errors/warnings (iso_reread_problem_files()), or simply re-read all files in a collection (iso_reread_all_files()).

# re-read the 3 dual inlet files from their original location if any have changed
di_files |>
  iso_reread_changed_files()
#> Info: found 0 changed data file(s), re-reading 0/3.
#> Data from 3 dual inlet iso files: 
#> # A tibble: 3 × 6
#>   file_id                 file_path_ file_subpath raw_data file_info method_info
#>   <chr>                   <chr>      <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did  dual_inle… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf  dual_inle… NA           8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA           82 cycl… 9 entries no method …

# update the file_root for the files before re-read (in this case to a location
# that does not hold these files and hence will lead to a warning)
di_files |>
  iso_set_file_root(root = ".") |>
  iso_reread_all_files()
#> Info: setting file root for 3 data file(s) to '.'
#> Warning: 3 file(s) do not exist at their referenced location and can not be re-read. Consider setting a new root directory with iso_set_file_root() first:
#>  - 'dual_inlet_example.did' in root '.'
#>  - 'dual_inlet_example.caf' in root '.'
#>  - 'dual_inlet_nu_example.txt' in root '.'
#> Info: found 0 data file(s), re-reading 0/3.
#> Data from 3 dual inlet iso files: 
#> # A tibble: 3 × 6
#>   file_id                 file_path_ file_subpath raw_data file_info method_info
#>   <chr>                   <chr>      <chr>        <glue>   <chr>     <chr>      
#> 1 dual_inlet_example.did  dual_inle… NA           7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf  dual_inle… NA           8 cycle… 22 entri… standards,…
#> 3 dual_inlet_nu_example.… dual_inle… NA           82 cycl… 9 entries no method …
#> 
#> Problem summary:
#> # A tibble: 3 × 3
#>   file_id                   warning error
#>   <chr>                       <int> <int>
#> 1 dual_inlet_example.caf          1     0
#> 2 dual_inlet_example.did          1     0
#> 3 dual_inlet_nu_example.txt       1     0

Units

Isoreader provides a built in data type with units (iso_with_units) that can be used to easily keep track of units inside data frame. These units can be made explicit (=included in the column header), stripped altogether, or turned back to be implicit.

# strip all units
cf_file |>
  iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`)) |>
  iso_strip_units() |> head(3)
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 4
#>   file_id                     `Ampl 28` `rIntensity 28` `d 15N/14N`
#>   <chr>                           <dbl>           <dbl>       <dbl>
#> 1 continuous_flow_example.dxf     3024.          57524.      0.0160
#> 2 continuous_flow_example.dxf     3023.          57383.      0     
#> 3 continuous_flow_example.dxf     2074.          52732.      1.05

# make units explicit
cf_file |>
  iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`)) |>
  iso_make_units_explicit() |> head(3)
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 4
#>   file_id              `Ampl 28 [mV]` `rIntensity 28 [mVs]` `d 15N/14N [permil]`
#>   <chr>                         <dbl>                 <dbl>                <dbl>
#> 1 continuous_flow_exa…          3024.                57524.               0.0160
#> 2 continuous_flow_exa…          3023.                57383.               0     
#> 3 continuous_flow_exa…          2074.                52732.               1.05

# introduce new unit columns e.g. in the file info
cf_file |>
  iso_mutate_file_info(weight = iso_with_units(0.42, "mg")) |>
  iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`),
                            include_file_info = weight) |>
  iso_make_units_explicit() |> head(3)
#> Info: mutating file info for 1 data file(s)
#> Info: aggregating vendor data table from 1 data file(s), including file info 'weight'
#> # A tibble: 3 × 5
#>   file_id                     `weight [mg]` `Ampl 28 [mV]` `rIntensity 28 [mVs]`
#>   <chr>                               <dbl>          <dbl>                 <dbl>
#> 1 continuous_flow_example.dxf          0.42          3024.                57524.
#> 2 continuous_flow_example.dxf          0.42          3023.                57383.
#> 3 continuous_flow_example.dxf          0.42          2074.                52732.
#> # ℹ 1 more variable: `d 15N/14N [permil]` <dbl>

# or turn a column e.g. with custom format units in the header into implicit units
cf_file |>
  iso_mutate_file_info(weight.mg = 0.42) |>
  iso_get_vendor_data_table(select = c(`Ampl 28`, `rIntensity 28`, `d 15N/14N`),
                            include_file_info = weight.mg) |>
  iso_make_units_implicit(prefix = ".", suffix = "") |> head(3)
#> Info: mutating file info for 1 data file(s)
#> Info: aggregating vendor data table from 1 data file(s), including file info 'weight.mg'
#> # A tibble: 3 × 5
#>   file_id                        weight `Ampl 28` `rIntensity 28`   `d 15N/14N`
#>   <chr>                       <dbl[mg]> <dbl[mV]>      <dbl[mVs]> <dbl[permil]>
#> 1 continuous_flow_example.dxf      0.42  3024.040        57524.32    0.01600287
#> 2 continuous_flow_example.dxf      0.42  3022.789        57383.21    0.00000000
#> 3 continuous_flow_example.dxf      0.42  2073.872        52731.67    1.04910065

Formatting

Formatting data into text is easily achieved with the built in R function sprintf but this package also provides a convenience function that knows how to incorporate units information from iso_with_units values. Use iso_format to format and concatenate any single values or entire columns inside a data frame.

# concatenation example with single values
iso_format(
   pi = 3.14159,
   x = iso_with_units(42, "mg"),
   ID = "ABC",
   signif = 4,
   sep = " | "
)
#> [1] "pi: 3.142 | x: 42mg | ID: ABC"

# example inside a data frame
cf_file |>
  iso_get_vendor_data_table(select = c(`Nr.`, `Ampl 28`, `d 15N/14N`)) |>
  dplyr::select(-file_id) |>
  head(3) |>
  # introduce new label columns using iso_format
  dplyr::mutate(
    # default concatenation of values
    label_default = iso_format(
      `Nr.`, `Ampl 28`, `d 15N/14N`,
      sep = ", "
    ),
    # concatenate with custom names for each value
    label_named = iso_format(
      `#` = `Nr.`, A = `Ampl 28`, d15 = `d 15N/14N`,
      sep = ", "
    ),
    # concatenate just the values and increase significant digits
    label_value = iso_format(
      `Nr.`, `Ampl 28`, `d 15N/14N`,
      sep = ", ", format_names = NULL, signif = 6
    )
  )
#> Info: aggregating vendor data table from 1 data file(s)
#> # A tibble: 3 × 6
#>     Nr. `Ampl 28`   `d 15N/14N` label_default            label_named label_value
#>   <int> <dbl[mV]> <dbl[permil]> <chr>                    <chr>       <chr>      
#> 1     1  3024.040    0.01600287 Nr.: 1, Ampl 28: 3020mV… #: 1, A: 3… 1, 3024.04…
#> 2     2  3022.789    0.00000000 Nr.: 2, Ampl 28: 3020mV… #: 2, A: 3… 2, 3022.79…
#> 3     3  2073.872    1.04910065 Nr.: 3, Ampl 28: 2070mV… #: 3, A: 2… 3, 2073.87…

2023-07-31