J Med Internet Res. 2025 Oct 8;27:e71034. doi: 10.2196/71034.
ABSTRACT
BACKGROUND: With 28%-35% of individuals aged 65 years and older experiencing incidents of falling, falls are the second leading cause of unintentional injury-related deaths globally. Limited availability of clinical staff often impedes the timely detection and prevention of potential falls. Advances in artificial intelligence (AI) could complement existing fall risk assessment and help better allocate nursing care resources. Yet, many studies are based on small datasets from a single institution, which can restrict the generalizability of the model, and do not investigate important aspects in AI model development, such as fairness across demographic groups.
OBJECTIVE: This study aimed to provide a comprehensive empirical evaluation of the potential of AI in nursing care, focusing on the case of fall risk prediction. To account for demographic and contextual differences in fall incidences, we analyze data from a university and a geriatric hospital in Germany. To the best of our knowledge, these are the largest fall risk prediction datasets to date with heterogeneous data distributions. We focus on 3 key objectives. First, does AI help in improving fall risk prediction? Second, how can AI models be trained safely across different hospitals? Finally, are these models fair?
METHODS: This study used 2 datasets for fall risk prediction: one from a university hospital with 931,726 participants, 10,442 of whom experienced falls, and another from a geriatric hospital with 12,773 participants, 1728 of whom have fallen. State-of-the-art AI models were trained with 3 approaches, including 2 decentralized learning paradigms. First, separate models were trained on data from each hospital; second, models were retrained on the respective other dataset; and federated learning (FL) was applied to both datasets. The performance of these models was compared with the rule-based systems as implemented in clinical practice for fall risk prediction. Additional analyses were conducted to test for model fairness.
RESULTS: Our findings demonstrate that AI models consistently outperform rule-based systems across all experimental setups, with the area under the receiver operating characteristic curve of 0.735 (90% CI 0.727-0.744) for the geriatric hospital, and 0.926 (90% CI 0.924-0.928) for the university hospital. FL did not improve the fall risk prediction in this setting. Our fairness analysis ruled out disparities in model performance between different sex groups, but we found fairness infringements across age groups.
CONCLUSIONS: This study demonstrates that AI models consistently outperform traditional rule-based systems across heterogeneous datasets in predicting fall risk. However, it also reveals the challenges related to demographic shifts and label distribution imbalances, which limited the FL models’ ability to generalize. While the fairness analysis indicated fair results across sex subgroups, age-related disparities emerged. Addressing data imbalances and ensuring broader representation across demographic groups will be crucial for developing more fair and generalizable models.
PMID:41061259 | DOI:10.2196/71034