Nat Comput Sci. 2026 Jun 30. doi: 10.1038/s43588-026-00999-7. Online ahead of print.
ABSTRACT
Dimensionality-reduction-based visualization is essential for interpreting complex biological data. Yet, unsupervised methods such as t-distributed stochastic neighbor embedding, Uniform Manifold Approximation and Projection, and Isomap reflect only the dominant data structure, which may not align with the goals of downstream analysis or expert-provided annotations. Existing supervised variants only partially address this mismatch and introduce new limitations. Here we present RF-PHATE, a supervised visualization approach that incorporates expert knowledge to reveal label-relevant structure while suppressing extraneous variation. RF-PHATE uses random forests to learn relationships between features and labels and translates this information into low-dimensional embeddings. RF-PHATE handles large datasets and is suitable for both classification and regression tasks. We demonstrate its use across four case studies, including longitudinal multiple sclerosis data, Raman spectral measurements of antioxidant effects, outcomes of patients with COVID-19, and RNA sequencing data with simulated dropout. These applications highlight RF-PHATE’s ability to enhance interpretability, manage noise and expose meaningful biological structure, suggesting broad potential for improving data exploration and discovery.
PMID:42380342 | DOI:10.1038/s43588-026-00999-7