vignettes/extract_data.Rmd
extract_data.Rmd
inspectEHR
contains a number of helper functions for
direct data extraction. The focus of data extraction in
inspectEHR
is to facilitate data quality evaluation. As a
result, the data extraction is optimised for extracting a single data
item or closely related data items. This approach is unlikely to be of
direct use to analyst, who will likely need to extract a large number of
different concepts for a specific patient cohort. In this latter case
use, please refer to the wranglEHR
package, which is
optimised for this purpose.
In order to use inspectEHR
you first need to install it
with:
# If you need inspectEHR
remotes::install_github("DocEd/inspectEHR")
Now load inspectEHR
, establish a database connection and
make the core
table. The core
table is a
remote query that includes all correct joins, so that you know it is
safe to work with.
library(inspectEHR)
## Establish a database connection using DBI
## Details omitted here for security
ctn <- DBI::dbConnect()
core <- make_core(ctn)
Now you can extract whatever data you wish with the
extract
function. This is the S3
generic that
will apply the correct method to extract a specified CC-HIC event.
# Extract Heart Rates
hr <- extract(core, code_name = "NIHR_HIC_ICU_0108")
knitr::kable(head(hr, 10))
episode_id | event_id | site | code_name | datetime | value |
---|---|---|---|---|---|
1 | 407676 | A | NIHR_HIC_ICU_0108 | 2015-12-31 19:46:27 | 27 |
1 | 39713 | A | NIHR_HIC_ICU_0108 | 2015-12-31 19:46:27 | 27 |
1 | 39714 | A | NIHR_HIC_ICU_0108 | 2015-12-31 20:46:27 | -36 |
1 | 39715 | A | NIHR_HIC_ICU_0108 | 2015-12-31 21:46:27 | 198 |
1 | 39716 | A | NIHR_HIC_ICU_0108 | 2015-12-31 22:46:27 | 127 |
1 | 39717 | A | NIHR_HIC_ICU_0108 | 2015-12-31 23:46:27 | 27 |
1 | 39718 | A | NIHR_HIC_ICU_0108 | 2016-01-01 00:46:27 | -31 |
1 | 39719 | A | NIHR_HIC_ICU_0108 | 2016-01-01 01:46:27 | -74 |
1 | 39720 | A | NIHR_HIC_ICU_0108 | 2016-01-01 02:46:27 | -49 |
1 | 39721 | A | NIHR_HIC_ICU_0108 | 2016-01-01 03:46:27 | -159 |
The event has been extracted into a standardised format, including other features that are typically useful when evaluating the context of data quality (e.g. where and when the data originated).
More complex data item extraction is also possible, and requires no further effort than seen above. We can demonstrate with central venous pressure (CVP), which contains metadata.
episode_id | event_id | site | code_name | datetime | value | meta_1 |
---|---|---|---|---|---|---|
1 | 407677 | A | NIHR_HIC_ICU_0116 | 2015-12-31 19:46:27 | 3.51 | 2 |
1 | 315907 | A | NIHR_HIC_ICU_0116 | 2015-12-31 19:46:27 | 3.51 | 2 |
1 | 315908 | A | NIHR_HIC_ICU_0116 | 2015-12-31 20:46:27 | 0.28 | 3 |
1 | 315909 | A | NIHR_HIC_ICU_0116 | 2015-12-31 21:46:27 | 5.38 | 2 |
1 | 315910 | A | NIHR_HIC_ICU_0116 | 2015-12-31 22:46:27 | 8.01 | 2 |
1 | 315911 | A | NIHR_HIC_ICU_0116 | 2015-12-31 23:46:27 | 10.76 | 1 |
1 | 315912 | A | NIHR_HIC_ICU_0116 | 2016-01-01 00:46:27 | 5.73 | 5 |
1 | 315913 | A | NIHR_HIC_ICU_0116 | 2016-01-01 01:46:27 | -2.75 | NA |
1 | 315914 | A | NIHR_HIC_ICU_0116 | 2016-01-01 02:46:27 | 2.06 | 5 |
1 | 315915 | A | NIHR_HIC_ICU_0116 | 2016-01-01 03:46:27 | 8.96 | 5 |
Now you can see an additional column containing the correct metadata. All column classes are handled automatically, including all metadata and timestamps for more complex data items. In this way the end user can be sure that the whole event has been extracted, without referring to the CC-HIC data model (which is not intuitive to interpret).
The extracted event is tagged with some useful attributes:
code_name
and class
.
attr(hr, "code_name")
#> [1] "NIHR_HIC_ICU_0108"
class(hr)
#> [1] "integer_2d" "tbl_df" "tbl" "data.frame"
The class encodes the temporally of the event (is it time variant or
not) and the data type (integer, string, real etc.). This is useful in
method dispatch when writing data quality evaluation functions. The
methods currently available for a class can be viewed with
methods()
methods(class = "integer_1d")
#> [1] evaluate_distribution evaluate_duplicate
#> [3] evaluate_local_missingness evaluate_range
#> see '?methods' for accessing help and source code