Does heterogeneity underlie differences in treatment effects estimated from SuperLearner versus logistic regression? An application in nutritional epidemiology

Ann Epidemiol. 2023 Apr 28:S1047-2797(23)00082-0. doi: 10.1016/j.annepidem.2023.04.017. Online ahead of print.

ABSTRACT

PURPOSE: A strength of SuperLearner is that it may accommodate key interactions between model variables without a priori specification. In prior research, protective associations between fruit intake and preeclampsia were stronger when estimated using SuperLearner with targeted maximum likelihood estimation (TMLE) compared with multivariable logistic regression without any interaction terms. We explored whether heterogeneity (i.e., differences in the effect estimate due to interactions between fruit intake and covariates) may partly explain differences in estimates from these two models.

METHODS: Using a US prospective pregnancy cohort (2010-2013, n=7781), we estimated preeclampsia risk differences (RDs) for higher versus lower fruit density using multivariable logistic regression and included 2-way statistical interactions between fruit density and each of the 25 model covariates. We compared the RDs with those from SuperLearner with TMLE (gold standard) and logistic regression with no interaction.

RESULTS: From the logistic regression models with 2-way statistical interactions, 48% of the preeclampsia RDs were ≤-0.02 (closer to SuperLearner with TMLE estimate); 40% equaled -0.01 (same as logistic regression with no interaction estimate); the minority were at or crossed the null.

CONCLUSIONS: Our exploratory analysis provided preliminary evidence that heterogeneity may partly explain differences in estimates from logistic regression versus SuperLearner with TMLE.

PMID:37121376 | DOI:10.1016/j.annepidem.2023.04.017

By Nevin Manimala