Bioinformatics. 2025 Sep 24:btaf540. doi: 10.1093/bioinformatics/btaf540. Online ahead of print.
ABSTRACT
SUMMARY: Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS’s predictive performance.
AVAILABILITY AND IMPLEMENTATION: The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID:40991324 | DOI:10.1093/bioinformatics/btaf540