Int J Biostat. 2026 May 15. doi: 10.1515/ijb-2024-0034. Online ahead of print.
ABSTRACT
The prognostic score (PGS) is a function of observed covariates that summarizes covariates’ association with potential responses. In the current study, we propose a full prognostic score (FPGS), an extension of the PGS that integrates individual prognostic scores to account for confounding adjustments in causal inference. Under effect modification, we show that FPGS and a version of FPGS using conditional expectations of the outcomes, meet the sufficiency condition for confounding adjustment to estimate the average causal effect. We present a general algorithm to implement the FPGS approach for estimation by applying linear regression, random forest regression, and XGBoost regression. When determining the average causal effect, we incorporate FPGS into semiparametric estimators including regression imputation, simple stratification, and targeted maximum likelihood estimation (TMLE). The finite-sample properties of the estimators are compared through three simulation studies. Based on the findings of the FPGS estimators, the mean squared error of the linear regression imputation estimator and the TMLE estimator comprised of linearly regressed PGS is smaller than the mean squared error of alternative estimators. In an empirical study, we analyze data from the National Health and Nutrition Examination Survey (NHANES, 2007-2008) to determine the effect of smoking on blood lead levels.
PMID:42137947 | DOI:10.1515/ijb-2024-0034