This vignette introduces some of the development features of the isoreader package and is aimed primarily at code contributors interested in expanding its functionality or helping with bug fixes.
Testing out new file format readers is easiest by registering a new
reader function for a specific file extension using
iso_register_dual_inlet_file_reader
and
iso_register_continuous_flow_file_reader
, respectively.
Both require an extension (e.g. ".ext"
), name of the new
reader function ("new_reader"
), and optionally a
description. Both functions automatically return a data frame with a
list of all registered reader. Overwriting of existing readers with a
different function requires an explicit overwrite = TRUE
flag. All reader functions must accept an isoreader data structure
object (ds
) as the first argument, a list of reader
specific options as the second argument (options
), and
should return the structure with data filled in for downstream isoreader
operations to work smoothly. The following minimal example illustrates
how to do this with the new_reader
function simply printing
out the layout of the provided data structure skeleton
ds
.
new_reader <- function(ds, options = list()) {
isoreader:::log_message("this is the new reader!")
str(ds)
return(ds)
}
# register new reader
readers <- iso_register_dual_inlet_file_reader(".new.did", "new_reader")
knitr::kable(readers)
type | call | extension | func | cacheable | post_read_check | description | software | env |
---|---|---|---|---|---|---|---|---|
dual inlet | iso_read_dual_inlet | .caf | iso_read_caf | TRUE | TRUE | Dual Inlet file format (older) | Isodat | isoreader |
dual inlet | iso_read_dual_inlet | .did | iso_read_did | TRUE | TRUE | Dual Inlet file format (newer) | Isodat | isoreader |
dual inlet | iso_read_dual_inlet | .txt | iso_read_nu | TRUE | TRUE | Dual Inlet file format | Nu | isoreader |
continuous flow | iso_read_continuous_flow | .cf | iso_read_cf | TRUE | TRUE | Continuous Flow file format (older) | Isodat | isoreader |
continuous flow | iso_read_continuous_flow | .dxf | iso_read_dxf | TRUE | TRUE | Continuous Flow file format (newer) | Isodat | isoreader |
continuous flow | iso_read_continuous_flow | .iarc | iso_read_flow_iarc | TRUE | TRUE | Continuous Flow data archive | ionOS | isoreader |
scan | iso_read_scan | .scn | iso_read_scn | TRUE | TRUE | Scan file format | Isodat | isoreader |
continuous flow | iso_read_continuous_flow | .cf.rds | iso_read_rds | FALSE | FALSE | R Data Storage | isoreader | isoreader |
dual inlet | iso_read_dual_inlet | .di.rds | iso_read_rds | FALSE | FALSE | R Data Storage | isoreader | isoreader |
scan | iso_read_scan | .scan.rds | iso_read_rds | FALSE | FALSE | R Data Storage | isoreader | isoreader |
dual inlet | iso_read_dual_inlet | .new.did | new_reader | TRUE | TRUE | NA | NA | R_GlobalEnv |
# copy an example file from the package with the new extension
iso_get_reader_example("dual_inlet_example.did") |> file.copy(to = "example.new.did")
#> [1] TRUE
# read the file
iso_read_dual_inlet("example.new.did", read_cache = FALSE)
#> Info: preparing to read 1 data files (all will be cached)...
#> Info: reading file 'example.new.did' with '.new.did' reader...
#> Info: this is the new reader!
#> List of 7
#> $ version :Classes 'package_version', 'numeric_version' hidden list of 1
#> ..$ : int [1:3] 1 4 1
#> $ read_options :List of 4
#> ..$ file_info : logi TRUE
#> ..$ method_info : logi TRUE
#> ..$ raw_data : logi TRUE
#> ..$ vendor_data_table: logi TRUE
#> $ file_info : tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
#> ..$ file_id : chr "example.new.did"
#> ..$ file_root : chr "."
#> ..$ file_path : chr "example.new.did"
#> ..$ file_subpath : chr NA
#> ..$ file_datetime: POSIXct[1:1], format: NA
#> ..$ file_size : int 134446
#> $ method_info : list()
#> $ raw_data : tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
#> Named list()
#> $ vendor_data_table: tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
#> Named list()
#> $ bgrd_data : tibble [0 × 0] (S3: tbl_df/tbl/data.frame)
#> Named list()
#> - attr(*, "class")= chr [1:2] "dual_inlet" "iso_file"
#> - attr(*, "problems")= tibble [0 × 3] (S3: tbl_df/tbl/data.frame)
#> ..$ type : chr(0)
#> ..$ func : chr(0)
#> ..$ details: chr(0)
#> Info: finished reading 1 files in 0.21 secs
#> Dual inlet iso file 'example.new.did': 0 cycles, 0 ions ()
file.remove("example.new.did")
#> [1] TRUE
Note that for parallel processing to work during the read process
(parallel = TRUE
), isoreader needs to know where to find
the new reader function. It will figure this out automatically as long
as the function name is unique but if this fails (or to be on the safe
side), please specify e.g. env = "R_GlobalEnv"
or
env = "newpackage"
during the reader registration. Also
note that isoreader will not automatically know where to find all
functions called from within the new reader function if they are not
part of base R and it is recommended to make all outside calls explicit
(e.g. dplyr::filter(...)
) to preempt this potential
problem. For info messages and warnings to work with the progress bar
and in parallel reads, make sure to use
isoreader:::log_message(...)
and
isoreader:::log_warning(...)
instead of base R’s
message(...)
and warning(...)
.
If you have designed and tested a new reader, please consider
contributing it to the isoreader
github repository via pull
request.
Isoreader defines two processing hooks at the beginning and end of
reading an individual file. This is useful for integration into
pipelines that require additional output (such as GUIs) but is also
sometimes useful for debugging purposes. The expressions are evaluated
in the context of the isoreader:::read_iso_file
function
and have access to all parameters passed to this function, such as
e.g. file_n
and path
. Same as for new readers:
for info messages and warnings to work with the progress bar and in
parallel reads, make sure to use
isoreader:::log_message(...)
and
isoreader:::log_warning(...)
instead of base R’s
message(...)
and warning(...)
. The main
difference between the two is that log_message()
will honor
the quiet = TRUE
flag passed to the main
iso_read...()
call whereas log_warning()
will
always show its message no matter the quiet
setting.
isoreader:::set_read_file_event_expr({
isoreader:::log_message(sprintf("starting file #%.d, named '%s'", file_n, basename(path)))
})
isoreader:::set_finish_file_event_expr({
isoreader:::log_message(sprintf("finished file #%.d", file_n))
})
c(
iso_get_reader_example("dual_inlet_example.did"),
iso_get_reader_example("dual_inlet_example.caf")
) |> iso_read_dual_inlet(read_cache = FALSE)
#> Info: preparing to read 2 data files (all will be cached)...
#> Info: reading file 'dual_inlet_example.did' with '.did' reader...
#> Info: starting file #1, named 'dual_inlet_example.did'
#> Info: finished file #1
#> Info: reading file 'dual_inlet_example.caf' with '.caf' reader...
#> Info: starting file #2, named 'dual_inlet_example.caf'
#> Info: finished file #2
#> Info: finished reading 2 files in 7.54 secs
#> Data from 2 dual inlet iso files:
#> # A tibble: 2 × 6
#> file_id file_path_ file_subpath raw_data file_info method_info
#> <chr> <chr> <chr> <glue> <chr> <chr>
#> 1 dual_inlet_example.did dual_inlet… NA 7 cycle… 16 entri… standards,…
#> 2 dual_inlet_example.caf dual_inlet… NA 8 cycle… 22 entri… standards,…
isoreader:::initialize_options() # reset all isoreader options
The best way to start debugging an isoreader call is to switch the
package into debug mode. This is done using the internal
iso_turn_debug_on()
function. This enables debug messages,
turns caching off by default so files are always read anew, and makes
the package keep more information in the isofile objects. It continues
to catch errors inside file readers (keeping track of them in the problems)
unless you set iso_turn_debug_on(catch_errors = FALSE)
, in
which case no errors are caught and stop the processing so you get the
full traceback and debugging options of your IDE.
Errors during the binary file reads usually indicate the approximate
position in the file where the error was encountered. The easiest way to
get started on figuring out what the file looks like at that position is
to use a binary file editor and jump to the position. For a sense of the
interpreted structure around that position, one can use
iso_print_source_file_structure()
which shows what binary
patterns isoreader recognized. This binary representation of the source
file is only available if the file is read while in debug mode,
otherwise file objects would get unnecessarily large:
# turn on debug mode
isoreader:::iso_turn_debug_on()
#> Info: debug mode turned on, error catching turned on, caching turned off
# read example file
ex <- iso_get_reader_example("dual_inlet_example.did") |>
iso_read_dual_inlet(quiet = TRUE)
# retrieve source structure and print a part of it
bin <- ex |> iso_get_source_file_structure()
bin |> iso_print_source_file_structure(length = 500)
#> # Textual representation of the partial structure (bytes 1 - 504) of the isodat file.
#> # Print more/less by specifying the 'start', 'length' or 'end' parameters.
#> 0000001: <CFileHeader>{unknown-4: 'fe f7 31 01'}
#> 0000022: <06-000>{text-10: 'CBlockData'}{text-18: 'CDualInletDocument'}<4x00>
#> 0000094: <03-000>{unknown-2: '2f 00'}{text-20: 'Acquisition-1568.did'}{text-11: 'File Header'}<4x00>
#> 0000174: <02-000>
#> 0000178: <02-000>
#> 0000182: <CTimeObject>
#> 0000199: <03-000>{unknown-2: '2f 00'}{text-4: 'Date'}{text-4: 'Date'}<4x00>
#> 0000233: <01-000>{unknown-4: '4a 2b 4e 54'}
#> 0000241: <CStr>
#> 0000251: <02-000>{text-18: 'RW2000TemplateName'}
#> 0000295: <02-000>{text-84: 'C:\Thermo\Isodat NT\Global\User\Dual Inlet System\Result Workshop\Default Result.IRW'}
#> 0000471: <CDataIndex>
#> 0000487: <03-000>{unknown-2: '2f 00'}{text-0: 'NA'}{text-0: 'NA'}<4x00>
This structure representation shows recognized control elements in
<...>
and data elements in {...}
which
are converted to text or numeric representation if the interpretation is
unambiguous, or plain hexadecimal characters if the nature of the data
cannot be determined with certainty. You can adjust start
and length
to look at different parts of the binary file or
save the the structure to a text file with
save_to_file
.
For an overview of all the elements (blocks
) identified
in the binary file as a tibble, use:
bin$blocks |> head(20)
#> # A tibble: 20 × 8
#> block_idx start end len data_len type priority block
#> <int> <int> <int> <int> <dbl> <chr> <int> <chr>
#> 1 1 1 17 17 11 C block 1 CFileHeader
#> 2 2 18 21 4 4 unknown 5 fe f7 31 01
#> 3 3 22 25 4 0 x-000 3 06-000
#> 4 4 26 49 24 10 text 2 CBlockData
#> 5 5 50 89 40 18 text 2 CDualInletDocument
#> 6 6 90 93 4 0 0000+ 4 4x00
#> 7 7 94 97 4 0 x-000 3 03-000
#> 8 8 98 99 2 2 unknown 5 2f 00
#> 9 9 100 143 44 20 text 2 Acquisition-1568.did
#> 10 10 144 169 26 11 text 2 File Header
#> 11 11 170 173 4 0 0000+ 4 4x00
#> 12 12 174 177 4 0 x-000 3 02-000
#> 13 13 178 181 4 0 x-000 3 02-000
#> 14 14 182 198 17 11 C block 1 CTimeObject
#> 15 15 199 202 4 0 x-000 3 03-000
#> 16 16 203 204 2 2 unknown 5 2f 00
#> 17 17 205 216 12 4 text 2 Date
#> 18 18 217 228 12 4 text 2 Date
#> 19 19 229 232 4 0 0000+ 4 4x00
#> 20 20 233 236 4 0 x-000 3 01-000
While this provides all elements, the top level structure is provided by the so-called control blocks:
bin$blocks |> dplyr::filter(type == "C block") |> head(20)
#> # A tibble: 20 × 8
#> block_idx start end len data_len type priority block
#> <int> <int> <int> <int> <dbl> <chr> <int> <chr>
#> 1 1 1 17 17 11 C block 1 CFileHeader
#> 2 14 182 198 17 11 C block 1 CTimeObject
#> 3 22 241 250 10 4 C block 1 CStr
#> 4 27 471 486 16 10 C block 1 CDataIndex
#> 5 35 513 535 23 17 C block 1 CSeqLineIndexData
#> 6 43 588 598 11 5 C block 1 CData
#> 7 113 1133 1157 25 19 C block 1 CDualInletBlockData
#> 8 121 1240 1261 22 16 C block 1 CMeasurmentInfos
#> 9 129 1324 1350 27 21 C block 1 CISLScriptMessageData
#> 10 163 1945 1967 23 17 C block 1 CMeasurmentErrors
#> 11 172 2038 2060 23 17 C block 1 CDualInletRawData
#> 12 180 2099 2114 16 10 C block 1 CBlockData
#> 13 188 2269 2302 34 28 C block 1 CIntegrationUnitTransf…
#> 14 199 2361 2380 20 14 C block 1 CIntensityData
#> 15 639 5299 5319 21 15 C block 1 CDualInletShout
#> 16 655 5460 5485 26 20 C block 1 CTwoDoublesArrayData
#> 17 827 6412 6433 22 16 C block 1 CStatusArrayData
#> 18 885 6747 6764 18 12 C block 1 COutlierData
#> 19 6461 40876 40902 27 21 C block 1 CResultDataSimpleList
#> 20 6469 40951 40973 23 17 C block 1 CResultDataSimple
To look at specific control-blocks, simply provide the relevant start
position to iso_print_source_file_structure()
:
cdata <- bin$blocks |> dplyr::filter(block == "CData")
cdata
#> # A tibble: 1 × 8
#> block_idx start end len data_len type priority block
#> <int> <int> <int> <int> <dbl> <chr> <int> <chr>
#> 1 43 588 598 11 5 C block 1 CData
bin |> iso_print_source_file_structure(start = cdata$start, length = 500)
#> # Textual representation of the partial structure (bytes 588 - 1098) of the isodat file.
#> # Print more/less by specifying the 'start', 'length' or 'end' parameters.
#> 0000588: <CData>
#> 0000599: <03-000>{unknown-2: '2f 00'}{text-3: '158'}{text-4: 'Line'}<4x00>{unknown-2: '0b 80'}
#> 0000633: <03-000>{unknown-2: '2f 00'}{text-1: '1'}{text-11: 'Peak Center'}<4x00>{unknown-2: '0b 80'}
#> 0000677: <03-000>{unknown-2: '2f 00'}{text-1: '1'}{text-11: 'Pressadjust'}<4x00>{unknown-2: '0b 80'}
#> 0000721: <03-000>{unknown-2: '2f 00'}{text-1: '1'}{text-10: 'Background'}<4x00>{unknown-2: '0b 80'}
#> 0000763: <03-000>{unknown-2: '2f 00'}{text-11: 'CIT Carrara'}{text-12: 'Identifier 1'}<4x00>{unknown-2: '0b 80'}
#> 0000829: <03-000>{unknown-2: '2f 00'}{text-2: '13'}{text-12: 'Identifier 2'}<4x00>{unknown-2: '0b 80'}
#> 0000877: <03-000>{unknown-2: '2f 00'}{text-5: '49077'}{text-8: 'Analysis'}<4x00>{unknown-2: '0b 80'}
#> 0000923: <03-000>{unknown-2: '2f 00'}{text-0: 'NA'}{text-7: 'Comment'}<4x00>{unknown-2: '0b 80'}
#> 0000957: <03-000>{unknown-2: '2f 00'}{text-0: 'NA'}{text-11: 'Preparation'}<4x00>{unknown-2: '0b 80'}
#> 0000999: <03-000>{unknown-2: '2f 00'}{text-0: 'NA'}{text-11: 'Post Script'}<4x00>{unknown-2: '0b 80'}
#> 0001041: <03-000>{unknown-2: '2f 00'}{text-16: 'CO2_multiply_16V'}{text-6: 'Method'}