Categories
Nevin Manimala Statistics

Large Language Models can Identify the Presence of MASH and Extract VCTE Measurements from Unstructured Documentation

Dig Dis Sci. 2025 Nov 8. doi: 10.1007/s10620-025-09539-1. Online ahead of print.

ABSTRACT

INTRODUCTION: Metabolic dysfunction-associated steatohepatitis (MASH) is a leading cause of cirrhosis. Vibration Controlled Transient Elastography (VCTE) measurements are often captured in text-based reports and not readily accessible for clinical research. Large language models (LLMs) show promise for curating information from unstructured documentation, but their efficiency for MASH and VCTE extraction are unclear.

METHODS: We used a cohort of 493 patients with compensated MASH cirrhosis. We compared the abilities of GPT-4o and Claude 3.5 Sonnet for identifying the presence of MASH and extracting maximum VCTE stiffness and Controlled Attenuation Parameter (CAP) measurements from clinical documentation. We ran a cost analysis of the LLMs. As exploratory analysis, we used LASSO-Cox to associate LLM-extracted features with death or decompensation.

RESULTS: For identifying MASH in clinical notes, GPT-4o and Claude 3.5 achieved F1-scores of 90.5% and 80.0%. For identifying peak VCTE measurements, GPT-4o achieved 99.3% and 99.1% accuracies for stiffness and CAP, while Claude 3.5 achieved 93.3% and 94.1% accuracies. LLM extraction of one variable required ~ 2000 tokens per note, with a cost of ~ $0.012/note for GPT-4o and ~ $0.014/note for Claude 3.5. In LASSO-Cox regressions, VCTE stiffness (HR 1.03, 95% CI 1.01-1.05, p = 0.016) and CAP score (HR 0.99, 95% CI 0.99-1.00, p = 0.029) were statistically significant predictive variables for death or decompensation.

CONCLUSIONS: LLMs can extract MASH presence and VCTE parameters from documentation with high accuracy and low cost. When incorporated into survival analyses, LLM-extracted variables are associated with important clinical outcomes. Given the growing availability of LLMs, liver diseases researchers should incorporate these methods to facilitate real-world studies.

PMID:41206385 | DOI:10.1007/s10620-025-09539-1

By Nevin Manimala

Portfolio Website for Nevin Manimala