Categories
Nevin Manimala Statistics

KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data

Bioinform Adv. 2023 Aug 1;3(1):vbad100. doi: 10.1093/bioadv/vbad100. eCollection 2023.

ABSTRACT

MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene-phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16).

RESULTS: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios.

AVAILABILITY AND IMPLEMENTATION: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples.

PMID:37565237 | PMC:PMC10409646 | DOI:10.1093/bioadv/vbad100

By Nevin Manimala

Portfolio Website for Nevin Manimala