Categories
Nevin Manimala Statistics

ABEILLE: a novel method for ABerrant Expression Identification empLoying machine Learning from RNA-sequencing data

Bioinformatics. 2022 Sep 5:btac603. doi: 10.1093/bioinformatics/btac603. Online ahead of print.

ABSTRACT

MOTIVATION: Current advances in omics technologies are paving the diagnosis of rare diseases proposing as a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) have demonstrated to yield potential pathogenic events. However popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts.

RESULTS: Hence we developed ABEILLE (ABerrant Expression Identification empLoying machine LEarning from sequencing data) a variational autoencoder (VAE) based method for the identification of AGEs from the analysis of RNA-seq data without the need of replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated to each gene in order to stratify AGE by severity of aberration. We tested ABEILLE on semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates.

AVAILABILITY: ABEILLE source code is freely available at : https://github.com/UCA-MSI/ABEILLE.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:36063052 | DOI:10.1093/bioinformatics/btac603

By Nevin Manimala

Portfolio Website for Nevin Manimala