Categories
Nevin Manimala Statistics

Leveraging genome-wide effects on gene expression to identify disease-critical genes with trans -genetic components

medRxiv [Preprint]. 2026 Feb 25:2026.02.23.26346922. doi: 10.64898/2026.02.23.26346922.

ABSTRACT

Genome-wide association studies (GWAS) have implicated tens of thousands of genetic variants associated with complex traits and polygenic diseases. Colocalizing GWAS variants with variants that may regulate gene expression, via expression quantitative trait loci (eQTL) mapping, has successfully led to the identification of disease-critical genes and their cell types of action. Recent studies predominantly colocalize proximal cis -eQTLs, which are estimated to regulate ∼10% of variance in gene expression levels. However, trans -eQTLs have been hypothesized to account for an additional ∼20% of expression levels, although few studies have attempted to quantify the variance explained by empirically associated trans -eQTLs. Here, we introduce EGRET (Estimating Genome-wide Regulatory Effects on the Transcriptome), an ensemble framework that jointly models cis -eQTLs with three distinct trans -eQTL mapping approaches: standard pairwise association testing via Matrix eQTL, and two functionally-informed methods, trans-PCO and GBAT. In real data, EGRET produced 353,408 predictive gene expression models (cross-validation R 2 > 0, p < 0.01) across 49 GTEx tissues, including 12,317 gene-tissue pairs with a significantly nonzero trans -heritable component. For this set of genes, EGRET models explain 33% more gene expression variance than cis -eQTL models (EGRET average R 2 = 0.104, FUSION average R 2 = 0.078). We found that putative trans-regulating variants of EGRET models are enriched for regulatory elements such as enhancers, histone marks, and cis -eQTLs of other genes. We then hypothesized that EGRET models could nominate new disease-critical genes via a transcriptome-wide association study (TWAS) framework that models genome-wide regulatory effects on gene expression. In simulations of theoretically representative gene expression architectures (∼30% heritability, where more than 70% is distal), EGRET increased the power to detect disease-critical genes by 1.2x-3.1x compared to cis -eQTL models. In real data analysis, we identified disease-associated genes via TWAS across GWAS summary statistics for 78 complex traits and polygenic diseases using gene expression prediction models from EGRET, cis -eQTL FUSION, and two state-of-the-art trans -eQTL TWAS methods, MOSTWAS and BGW-TWAS. EGRET identified 450,825 gene-disease associations that were not identified by FUSION models, 2,900 associations not identified by MOSTWAS, and 5,498 associations not identified by BGW-TWAS. Finally, we used EGRET models to construct gene regulatory networks, some of which harbored genes that were jointly associated with complex traits. For example, the gene members of the network defined by ARHGEF3 , whose cis -regulatory variants help predict expression of 10 genes in trans , were concordantly associated with platelet count using EGRET but not FUSION models. Overall, we find that modeling the genome-wide genetic component of gene expression greatly boosts the detection of disease-critical genes and helps define gene regulatory networks while improving the characterization of GWAS variants.

PMID:41810371 | PMC:PMC12970377 | DOI:10.64898/2026.02.23.26346922

By Nevin Manimala

Portfolio Website for Nevin Manimala