Categories
Nevin Manimala Statistics

Gene set analysis for time-to-event outcome: comparison of a new approach based on the generalized Berk-Jones statistic with existing methods in presence of intra gene-set correlation

Brief Bioinform. 2026 May 4;27(3):bbag262. doi: 10.1093/bib/bbag262.

ABSTRACT

Gene set analysis evaluates the collective impact of groups of genes on an outcome of interest, such as disease occurrence. By incorporating biological knowledge through predefined gene sets, this approach enhances the interpretability of results and improves statistical power compared with gene-wise analyses. In the context of time-to-event data, existing methods are limited and fail to account for potentially strong correlations within gene sets. Given the strong performance of the Generalized Berk-Jones (GBJ) statistic, which effectively incorporates correlation within the test statistic, we adapted this method to the time-to-event framework using a Cox model. We then compared its performance with established methods, including the Cauchy, Harmonic Mean, Wald test, global test, and global boost test. We further benchmarked these methods in two different real-world datasets: gliomas and breast cancer. Our proposed method, sGBJ, shows an overcontrol of Type I error, leading to reduced statistical power compared with other methods in numerical studies particularly when the number of genes is greater than or equal to the number of observations. The Wald test and global boost test generally exhibited the highest power, except in very high-correlation settings for the global boost test, while the Wald test could not adjust for confounders in current implementations.

PMID:42218714 | DOI:10.1093/bib/bbag262

By Nevin Manimala

Portfolio Website for Nevin Manimala