Stat Med. 2026 Apr;45(8-9):e70433. doi: 10.1002/sim.70433.
ABSTRACT
Screening and surveillance programs for cancer, such as colorectal cancer (CRC), often yield electronic health records (EHR) of screening time, test results, and covariates. We consider EHR from CRC surveillance of individuals who have a high cancer risk due to their family history. These individuals, therefore, receive regular colonoscopies with the goal of finding and removing adenomas, precursor lesions to CRC. Our objective is to estimate time to adenoma incidence and explore associations with covariates. However, in doing so, several challenges of the CRC surveillance EHR have to be addressed. Importantly, the adenoma events are interval-censored, meaning the exact event times are unknown and only fall within intervals defined by colonoscopy visits. Furthermore, colonoscopies can miss adenomas due to human or technical error, leading to misclassification of individuals with adenomas as adenoma-free. Finally, the EHR data include individuals with adenomas at baseline, termed prevalent cases. This prevalence status may be unobserved if the baseline colonoscopy is missing or fails to detect existing adenomas. To address these challenges in the CRC EHR, and screening data in general, we develop a new prevalence-incidence mixture model (PIM) with a Bayesian estimation back-end through data augmentation and regularization priors. We show how to fit the model, estimate cumulative incidence functions, and evaluate model fit using information criteria as well as a non-parametric estimator. In extensive simulations, we show good performance of the model when informative priors on the test sensitivity are provided, which is usually possible. An implementation in the R package BayesPIM is provided.
PMID:41943977 | DOI:10.1002/sim.70433