BMC Med Res Methodol. 2026 May 14;26(1):116. doi: 10.1186/s12874-026-02875-4.
ABSTRACT
Joint models offer an unbiased statistical approach for analyzing the effects of longitudinal biomarkers on time-to-event outcomes, providing an alternative to time-varying Cox proportional-hazards regression and the two-stage approach. However, whether available implementations of these methods perform reliably across different practically relevant scenarios remains insufficiently studied. We conducted a simulation study based on the Berlin Initiative Study examining kidney function and survival in older adults. In a manner comparable to phase IV studies in clinical research, our evaluation aims to provide insights into the practical performance of commonly used R package implementations of these methods, mostly under their default settings. By varying data generating scenarios, we assessed how different numbers of events and longitudinal measurements affect performance of Bayesian (JMbayes2) and frequentist joint model implementations (JM and joineRML), time-varying Cox PH regression (survival), and the two-stage approach (nlme and survival), focusing on bias in parameter estimates. Results revealed substantial variability across implementations. The JM package exhibited considerable bias and frequent convergence issues. In contrast, joineRML performed robustly with approximately unbiased estimates for association parameters and high convergence frequencies comparable to the implementations of the simpler methods across diverse scenarios. However, both frequentist packages systematically underestimated the effects of baseline covariates in the survival model. The Bayesian JMbayes2 was largely unbiased, but performance deteriorated under two conditions: with few events (< 70), convergence was low and bias persisted even in converged models; and with observation-to-event ratios below 2, convergence declined, although estimates from converged models remained approximately unbiased. Time-varying Cox PH regression and the two-stage approach showed more bias than JMbayes2 in certain settings but tended to achieve more robust performance and convergence across most scenarios.
PMID:42135742 | DOI:10.1186/s12874-026-02875-4