inspectEHR contains a number of helper functions for direct data extraction. The focus of data extraction in inspectEHR is to facilitate data quality evaluation. As a result, the data extraction is optimised for extracting a single data item or closely related data items. This approach is unlikely to be of direct use to analyst, who will likely need to extract a large number of different concepts for a specific patient cohort. In this latter case use, please refer to the wranglEHR package, which is optimised for this purpose.

In order to use inspectEHR you first need to install it with:

# If you need inspectEHR
remotes::install_github("DocEd/inspectEHR")

Now load inspectEHR, establish a database connection and make the core table. The core table is a remote query that includes all correct joins, so that you know it is safe to work with.

library(inspectEHR)

## Establish a database connection using DBI
## Details omitted here for security
ctn <- DBI::dbConnect()
core <- make_core(ctn)

Now you can extract whatever data you wish with the extract function. This is the S3 generic that will apply the correct method to extract a specified CC-HIC event.

# Extract Heart Rates
hr <- extract(core, code_name = "NIHR_HIC_ICU_0108")
knitr::kable(head(hr, 10))
episode_id event_id site code_name datetime value
1 407676 A NIHR_HIC_ICU_0108 2015-12-31 19:46:27 27
1 39713 A NIHR_HIC_ICU_0108 2015-12-31 19:46:27 27
1 39714 A NIHR_HIC_ICU_0108 2015-12-31 20:46:27 -36
1 39715 A NIHR_HIC_ICU_0108 2015-12-31 21:46:27 198
1 39716 A NIHR_HIC_ICU_0108 2015-12-31 22:46:27 127
1 39717 A NIHR_HIC_ICU_0108 2015-12-31 23:46:27 27
1 39718 A NIHR_HIC_ICU_0108 2016-01-01 00:46:27 -31
1 39719 A NIHR_HIC_ICU_0108 2016-01-01 01:46:27 -74
1 39720 A NIHR_HIC_ICU_0108 2016-01-01 02:46:27 -49
1 39721 A NIHR_HIC_ICU_0108 2016-01-01 03:46:27 -159

The event has been extracted into a standardised format, including other features that are typically useful when evaluating the context of data quality (e.g. where and when the data originated).

More complex data item extraction is also possible, and requires no further effort than seen above. We can demonstrate with central venous pressure (CVP), which contains metadata.

# Extract CVP
cvp <- extract(core, code_name = "NIHR_HIC_ICU_0116")
knitr::kable(head(cvp, 10))
episode_id event_id site code_name datetime value meta_1
1 407677 A NIHR_HIC_ICU_0116 2015-12-31 19:46:27 3.51 2
1 315907 A NIHR_HIC_ICU_0116 2015-12-31 19:46:27 3.51 2
1 315908 A NIHR_HIC_ICU_0116 2015-12-31 20:46:27 0.28 3
1 315909 A NIHR_HIC_ICU_0116 2015-12-31 21:46:27 5.38 2
1 315910 A NIHR_HIC_ICU_0116 2015-12-31 22:46:27 8.01 2
1 315911 A NIHR_HIC_ICU_0116 2015-12-31 23:46:27 10.76 1
1 315912 A NIHR_HIC_ICU_0116 2016-01-01 00:46:27 5.73 5
1 315913 A NIHR_HIC_ICU_0116 2016-01-01 01:46:27 -2.75 NA
1 315914 A NIHR_HIC_ICU_0116 2016-01-01 02:46:27 2.06 5
1 315915 A NIHR_HIC_ICU_0116 2016-01-01 03:46:27 8.96 5

Now you can see an additional column containing the correct metadata. All column classes are handled automatically, including all metadata and timestamps for more complex data items. In this way the end user can be sure that the whole event has been extracted, without referring to the CC-HIC data model (which is not intuitive to interpret).

The extracted event is tagged with some useful attributes: code_name and class.

attr(hr, "code_name")
#> [1] "NIHR_HIC_ICU_0108"
class(hr)
#> [1] "integer_2d" "tbl_df"     "tbl"        "data.frame"

The class encodes the temporally of the event (is it time variant or not) and the data type (integer, string, real etc.). This is useful in method dispatch when writing data quality evaluation functions. The methods currently available for a class can be viewed with methods()

methods(class = "integer_1d")
#> [1] evaluate_distribution      evaluate_duplicate        
#> [3] evaluate_local_missingness evaluate_range            
#> see '?methods' for accessing help and source code