inspectEHR
is a quality evaluation tool for CC-HIC. The
goal of this package is to provide a core set of functions to detect
erroneous or otherwise questionable data within CC-HIC, so that the end
researcher can handle this information explicitly.
The design ethos of inspectEHR
is to apply a
comprehensive interpretation of the Khan data quality evaluation
framework.
The full default evaluation is largely automated, and can be
performed by use of the perform_evaluation()
function. The
output from this function is stored in the CC-HIC database as metadata,
so that downstream research can use a consistent set of rules when faced
with erroneous data patterns.
Helper functions for extracting and wrangling data have also been
developed. These live in the sister package wranglEHR
,
which must be installed for inspectEHR to function correctly.
# install directly from github with
remotes::install_github("DocEd/inspectEHR")
library(inspectEHR)
## Create a connection to the CC-HIC database. (I leave this blank here for security).
ctn <- DBI::dbConnect()
perform_evaluation(
connection = ctn,
output_folder = "/data/cchic/ed/phd/data_quality_evaluation/tmp/",
verbose = TRUE)
# I recommend working in `verbose` mode so you can track progress of the evaluation.
DBI::dbDisconnect(ctn)
If you find a bug, please file a minimal reproducible example on github.
Please submit an issue and tag it with “data quality”. Data quality issues are often related to a specific site. If this is the case, please also tag the site.
The data quality rules are largely based upon standards set by OHDSI1, and the consensus guidelines by Khan et al.2. The CC-HIC currently uses an episode centric model. As such, many of the data quality checks are based around this way of thinking. As we move to OMOP (a patient centric model) many of these will change accordingly. General conventions follow that:
The same principles listed above can be retested under a validation framework. That is to seak an external resource to validate the values in question. An aim is to check the data against the ICNARC reports, which will allows for some external validation against a gold standard resource. At present, the validation that is performed is to compaire all sites against each other. In this sense, each site is used as a validation check against the others. Distrepancies should either be due to systemic errors, or differences in casemix.