Challenges in replicating secondary analysis of electronic health records data with multiple computable phenotypes: A case study on methicillin-resistant Staphylococcus aureus bacteremia infections

Int J Med Inform. 2021 Jul 16;153:104531. doi: 10.1016/j.ijmedinf.2021.104531. Online ahead of print.

ABSTRACT

BACKGROUND: Replication of prediction modeling using electronic health records (EHR) is challenging because of the necessity to compute phenotypes including study cohort, outcomes, and covariates. However, some phenotypes may not be easily replicated across EHR data sources due to a variety of reasons such as the lack of gold standard definitions and documentation variations across systems, which may lead to measurement error and potential bias. Methicillin-resistant Staphylococcus aureus (MRSA) infections are responsible for high mortality worldwide. With limited treatment options for the infection, the ability to predict MRSA outcome is of interest. However, replicating these MRSA outcome prediction models using EHR data is problematic due to the lack of well-defined computable phenotypes for many of the predictors as well as study inclusion and outcome criteria.

OBJECTIVE: In this study, we aimed to evaluate a prediction model for 30-day mortality after MRSA bacteremia infection diagnosis with reduced vancomycin susceptibility (MRSA-RVS) considering multiple computable phenotypes using EHR data.

METHODS: We used EHR data from a large academic health center in the United States to replicate the original study conducted in Taiwan. We derived multiple computable phenotypes of risk factors and predictors used in the original study, reported stratified descriptive statistics, and assessed the performance of the prediction model.

RESULTS: In our replication study, it was possible to (re)compute most of the original variables. Nevertheless, for certain variables, their computable phenotypes can only be approximated by proxy with structured EHR data items, especially the composite clinical indices such as the Pitt bacteremia score. Even computable phenotype for the outcome variable was subject to variation on the basis of the admission/discharge windows. The replicated prediction model exhibited only a mild discriminatory ability.

CONCLUSION: Despite the rich information in EHR data, replication of prediction models involving complex predictors is still challenging, often due to the limited availability of validated computable phenotypes. On the other hand, it is often possible to derive proxy computable phenotypes that can be further validated and calibrated.

PMID:34332468 | DOI:10.1016/j.ijmedinf.2021.104531

By Nevin Manimala