JCO Clin Cancer Inform. 2025 Apr;9:e2400227. doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.
ABSTRACT
PURPOSE: Recurrences after curative resection in early-stage and locoregionally advanced non-small cell lung cancer (NSCLC) are common, necessitating a nuanced understanding of associated risk factors. This study aimed to establish a natural language processing (NLP) system to efficiently curate recurrence data in NSCLC and analyze risk factors longitudinally.
PATIENTS AND METHODS: Electronic health records of 6,351 patients with NSCLC with >700,000 notes were obtained from Mount Sinai’s data sets. A deep learning-based customized NLP system was developed to identify cohorts experiencing recurrence. Recurrence types and rates over time were stratified by various clinical features. Cohort description analysis, Kaplan-Meier analysis for overall recurrence-free survival (RFS) and distant metastasis-free survival (DMFS), and Cox proportional hazards analysis were performed.
RESULTS: Of 1,295 patients with stage I-IIIA NSCLC with surgical resections, 336 patients (25.9%) experienced recurrence, as identified through NLP. The NLP system achieved a precision of 94.3%, a recall of 93%, and an F1 score of 93.5. Among 336 patients, 52.4% had local/regional recurrences, 44% distant metastases, and 3.6% unknown recurrence. RFS rates at years 1-5 were 93%, 81%, 73%, 67%, and 61%, respectively (96%, 89%, 84%, 80%, and 75% for distant metastasis). Stage-specific RFS rates at year 5 were 73% (IA), 62% (IB), 47% (IIA), 46% (IIB), and 20% (IIIA). Stage IB patients had a significantly higher likelihood of recurrence versus stage IA (adjusted hazard ratio [aHR], 1.63; P = .02). The RFS was lower in patients with clinically significant TP53 alteration (v TP53-negative or unknown significance), affecting overall RFS (aHR, 1.89; P = .007) and DMFS (aHR, 2.47; P = .009) among stage IA/IB patients.
CONCLUSION: Our scalable NLP system enabled us to generate real-world insights into NSCLC recurrences, paving the way for predictive models for preventing, diagnosing, and treating NSCLC recurrence.
PMID:40249880 | DOI:10.1200/CCI-24-00227