Categories
Nevin Manimala Statistics

Algorithmic bias evaluation in 30-day hospital readmission models: A retrospective analysis of hospital discharges

J Med Internet Res. 2024 Feb 27. doi: 10.2196/47125. Online ahead of print.

ABSTRACT

BACKGROUND: The adoption of predictive algorithms in healthcare comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics were proposed to measure algorithmic bias, but the application to real-world tasks is limited.

OBJECTIVE: This study aims to evaluate the algorithmic bias between racial and income groups associated with the application of common 30-day hospital readmission models and assesses the usefulness and interpretability of selected fairness metrics.

METHODS: This retrospective study used 10.6 million adult inpatient discharges from Maryland and Florida from 2016-2019. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified CMS readmission measure, which was applied “as-is” (using existing coefficients) and “retrained” (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all population, and between Black and white populations and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), zero-one-loss, and generalized entropy index. Racial bias represented by FNR and FPR differences were stratified by individual hospital and population composition to explore shifts of algorithmic bias in different populations.

RESULTS: The retrained CMS model demonstrated the best predictive performance (AUC: 0.74 in Maryland and 0.68-0.70 in Florida) and modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida) across subpopulations in both states. Calibration was better in white (compared to Black) and other-income (compared to low income) populations; and AUC was higher or similar in Black population (compared to white). Retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, modified HOSPITAL score showed the lowest racial bias; and modified HOSPITAL score and retrained CMS overall had the lowest income bias. In both states, white and higher income patient groups showed a higher FNR while Black and low-income patient groups resulted in a higher FPR and higher zero-one-loss. When stratified by hospital and population composition, these models demonstrated heterogenous algorithmic bias in different context and populations.

CONCLUSIONS: Caution must be taken when interpreting fairness measures’ face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect healthcare utilization patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfections of health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. Such assessment, however, is an important first step toward data-driven improvement to address existing health disparities.

PMID:38422347 | DOI:10.2196/47125

By Nevin Manimala

Portfolio Website for Nevin Manimala