Concordance between nutritional cohorts and randomized trials: biological confirmation or statistical consequence? A re-analysis of Stadelmaier et al

BMC Med. 2026 Apr 9;24(1):217. doi: 10.1186/s12916-026-04819-7.

ABSTRACT

BACKGROUND: Stadelmaier et al. recently reported a pooled ratio of risk ratios (RRR) of 1.00 between population, intervention/exposure, comparator, outcome (PI/ECO) matched randomized controlled trials (RCTs) and cohort studies, interpreting this as evidence that randomized trials “confirm” observational findings in nutrition with “tremendous public health implications.” We hypothesized that this apparent agreement is not a validation of concordance, but an expected statistical effect arising from the pooling of null or small effect sizes with large variances rather than pairwise concordance.

METHODS: We re-analyzed the authors’ binary dataset (n = 54 pairs) using a permutation framework (B = 10,000) in which cohort studies were randomly reassigned to RCTs, breaking the original pairing (both globally and restricted within exposure-type strata) while preserving marginal distributions.

RESULTS: Under random pairing, 99.96% of permuted pooled RRRs fell within the authors’ reported confidence interval [0.91,1.10], and the observed RRR was statistically indistinguishable from the random-pairing permutation distribution (p = 0.12). A pairwise discrepancy statistic showed no improvement with matching over random pairing (p = 0.21). A precision-weighted statistic was significant (p = 0.008), but weight was highly concentrated: 4 pairs (7% of n) accounted for 54.8% of total weight, and the effective number of pairs was 9.9 (18% of n). Qualitative concordance using the authors’ prior criteria was only 13% (7/54 pairs).

CONCLUSIONS: A pooled RRR near 1.00 indicates only that cohort effect sizes are not systematically different from RCT estimates. In this dataset, an RRR near 1 is a result expected under random pairing. Pairwise agreement is not improved by PI/ECO matching for typical comparisons; the significant weighted statistic reflects alignment among a small minority of high-precision pairs, not “most” findings. Pooled RRR lacks resolution as a confirmation metric in fields with small effects and should not be interpreted as evidence of replication; otherwise, it may inappropriately influence evidence synthesis practices.

PMID:41957646 | DOI:10.1186/s12916-026-04819-7

By Nevin Manimala