Diagn Progn Res. 2025 Sep 8;9(1):19. doi: 10.1186/s41512-025-00205-8.
ABSTRACT
BACKGROUND: Hospital-acquired venous thromboembolism (HA-VTE) is a leading cause of morbidity and mortality among hospitalized adults. Numerous prognostic models have been developed to identify those patients with elevated risk of HA-VTE. None, however, has met the necessary criteria to guide clinical decision-making. This study outlines a protocol for refining and validating a general-purpose prognostic model for HA-VTE, designed for real-time automation within the electronic health record (EHR) system.
METHODS: A retrospective cohort of 132,561 inpatient encounters (89,586 individual patients) at a large academic medical center will be collected, along with clinical and demographic data available as part of routine care. Data for temporal, geographic, and domain external validation cohorts will also be collected. Logistic regression will be used to predict occurrence of HA-VTE during an inpatient encounter. Variables considered for model inclusion will be based on prior demonstrated association with HA-VTE and their availability in both retrospective EHR data and routine clinical care. Least absolute shrinkage and selection operator (LASSO) with tenfold cross-validation will be used for initial variable selection. Variables selected by the LASSO procedure, along with those deemed necessary by clinicians, will be used in an unpenalized multivariable logistic regression model. Discrimination and calibration will be reported for the derivation and validation cohorts. Discrimination will be measured using Harrell’s C statistic. Calibration will be measured using calibration intercept, calibration slope, Brier score, integrated calibration index, and visual examination of non-linear calibration curve. Model reporting will adhere to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines for clinical prediction models using machine learning methods (TRIPOD + AI).
DISCUSSION: We describe methods for developing, evaluating, and validating a prognostic model for HA-VTE using routinely collected EHR data. By combining best practices in statistical development and validation, knowledge engineering, and clinical domain knowledge, the resulting model should be well suited for real-time clinical implementation. Although this protocol describes our development of a model for HA-VTE, the general approach can be applied to other clinical outcomes.
PMID:40916049 | DOI:10.1186/s41512-025-00205-8