Categories
Nevin Manimala Statistics

Modeling drug combination effects via latent tensor reconstruction

Bioinformatics. 2021 Jul 12;37(Supplement_1):i93-i101. doi: 10.1093/bioinformatics/btab308.

ABSTRACT

MOTIVATION: Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects.

RESULTS: We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose-response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line.

AVAILABILITY AND IMPLEMENTATION: comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252952 | DOI:10.1093/bioinformatics/btab308

Categories
Nevin Manimala Statistics

Disease gene prediction with privileged information and heteroscedastic dropout

Bioinformatics. 2021 Jul 12;37(Supplement_1):i410-i417. doi: 10.1093/bioinformatics/btab310.

ABSTRACT

MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models.

RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage.

AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.

PMID:34252957 | DOI:10.1093/bioinformatics/btab310

Categories
Nevin Manimala Statistics

Metaball skinning of synthetic astroglial morphologies into realistic mesh models for visual analytics and in silico simulations

Bioinformatics. 2021 Jul 12;37(Supplement_1):i426-i433. doi: 10.1093/bioinformatics/btab280.

ABSTRACT

MOTIVATION: Astrocytes, the most abundant glial cells in the mammalian brain, have an instrumental role in developing neuronal circuits. They contribute to the physical structuring of the brain, modulating synaptic activity and maintaining the blood-brain barrier in addition to other significant aspects that impact brain function. Biophysically, detailed astrocytic models are key to unraveling their functional mechanisms via molecular simulations at microscopic scales. Detailed, and complete, biological reconstructions of astrocytic cells are sparse. Nonetheless, data-driven digital reconstruction of astroglial morphologies that are statistically identical to biological counterparts are becoming available. We use those synthetic morphologies to generate astrocytic meshes with realistic geometries, making it possible to perform these simulations.

RESULTS: We present an unconditionally robust method capable of reconstructing high fidelity polygonal meshes of astroglial cells from algorithmically-synthesized morphologies. Our method uses implicit surfaces, or metaballs, to skin the different structural components of astrocytes and then blend them in a seamless fashion. We also provide an end-to-end pipeline to produce optimized two- and three-dimensional meshes for visual analytics and simulations, respectively. The performance of our pipeline has been assessed with a group of 5000 astroglial morphologies and the geometric metrics of the resulting meshes are evaluated. The usability of the meshes is then demonstrated with different use cases.

AVAILABILITY AND IMPLEMENTATION: Our metaball skinning algorithm is implemented in Blender 2.82 relying on its Python API (Application Programming Interface). To make it accessible to computational biologists and neuroscientists, the implementation has been integrated into NeuroMorphoVis, an open source and domain specific package that is primarily designed for neuronal morphology visualization and meshing.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252950 | DOI:10.1093/bioinformatics/btab280

Categories
Nevin Manimala Statistics

CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data

Bioinformatics. 2021 Jul 12;37(Supplement_1):i51-i58. doi: 10.1093/bioinformatics/btab286.

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technology has been widely applied to capture the heterogeneity of different cell types within complex tissues. An essential step in scRNA-seq data analysis is the annotation of cell types. Traditional cell-type annotation is mainly clustering the cells first, and then using the aggregated cluster-level expression profiles and the marker genes to label each cluster. Such methods are greatly dependent on the clustering results, which are insufficient for accurate annotation.

RESULTS: In this article, we propose a semi-supervised learning method for cell-type annotation called CALLR. It combines unsupervised learning represented by the graph Laplacian matrix constructed from all the cells and supervised learning using sparse logistic regression. By alternately updating the cell clusters and annotation labels, high annotation accuracy can be achieved. The model is formulated as an optimization problem, and a computationally efficient algorithm is developed to solve it. Experiments on 10 real datasets show that CALLR outperforms the compared (semi-)supervised learning methods, and the popular clustering methods.

AVAILABILITY AND IMPLEMENTATION: The implementation of CALLR is available at https://github.com/MathSZhang/CALLR.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252936 | DOI:10.1093/bioinformatics/btab286

Categories
Nevin Manimala Statistics

Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited

Bioinformatics. 2021 Jul 12;37(Supplement_1):i111-i119. doi: 10.1093/bioinformatics/btab263.

ABSTRACT

MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted.

RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution.

AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts.

PMID:34252944 | DOI:10.1093/bioinformatics/btab263

Categories
Nevin Manimala Statistics

Predicting MHC-peptide binding affinity by differential boundary tree

Bioinformatics. 2021 Jul 12;37(Supplement_1):i254-i261. doi: 10.1093/bioinformatics/btab312.

ABSTRACT

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods.

RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity.

AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252932 | DOI:10.1093/bioinformatics/btab312

Categories
Nevin Manimala Statistics

‘Single-subject studies’-derived analyses unveil altered biomechanisms between very small cohorts: implications for rare diseases

Bioinformatics. 2021 Jul 12;37(Supplement_1):i67-i75. doi: 10.1093/bioinformatics/btab290.

ABSTRACT

MOTIVATION: Identifying altered transcripts between very small human cohorts is particularly challenging and is compounded by the low accrual rate of human subjects in rare diseases or sub-stratified common disorders. Yet, single-subject studies (S3) can compare paired transcriptome samples drawn from the same patient under two conditions (e.g. treated versus pre-treatment) and suggest patient-specific responsive biomechanisms based on the overrepresentation of functionally defined gene sets. These improve statistical power by: (i) reducing the total features tested and (ii) relaxing the requirement of within-cohort uniformity at the transcript level. We propose Inter-N-of-1, a novel method, to identify meaningful differences between very small cohorts by using the effect size of ‘single-subject-study’-derived responsive biological mechanisms.

RESULTS: In each subject, Inter-N-of-1 requires applying previously published S3-type N-of-1-pathways MixEnrich to two paired samples (e.g. diseased versus unaffected tissues) for determining patient-specific enriched genes sets: Odds Ratios (S3-OR) and S3-variance using Gene Ontology Biological Processes. To evaluate small cohorts, we calculated the precision and recall of Inter-N-of-1 and that of a control method (GLM+EGS) when comparing two cohorts of decreasing sizes (from 20 versus 20 to 2 versus 2) in a comprehensive six-parameter simulation and in a proof-of-concept clinical dataset. In simulations, the Inter-N-of-1 median precision and recall are > 90% and >75% in cohorts of 3 versus 3 distinct subjects (regardless of the parameter values), whereas conventional methods outperform Inter-N-of-1 at sample sizes 9 versus 9 and larger. Similar results were obtained in the clinical proof-of-concept dataset.

AVAILABILITY AND IMPLEMENTATION: R software is available at Lussierlab.net/BSSD.

PMID:34252934 | DOI:10.1093/bioinformatics/btab290

Categories
Nevin Manimala Statistics

An intersectional identity approach to chronic pain disparities using latent class analysis

Pain. 2021 Jul 9. doi: 10.1097/j.pain.0000000000002407. Online ahead of print.

ABSTRACT

Research on intersectionality and chronic pain disparities is very limited. Intersectionality explores the interconnections between multiple aspects of identity and provides a more accurate picture of disparities. This study applied a relatively novel statistical approach (i.e., Latent Class Analysis; LCA) to examine chronic pain disparities with an intersectional identity approach. Cross-sectional data were analyzed using pre-treatment data from the Learning About My Pain (LAMP) trial, a randomized comparative effectiveness study of group-based psychosocial interventions (PCORI Contract #941, Beverly Thorn, PI; clinicaltrials.gov identifier NCT01967342) for patients receiving care for chronic pain at low-income clinics in rural and suburban Alabama. LCA results suggested a 5-class model. In order to easily identify each class, the following labels were created: Older Adults (OA), Younger Adults (YA), Severe Disparity (SD), Older/Black/African-American (OB), and Working Women (WW). The latent disparity classes varied by pre-treatment chronic pain functioning. Overall, the SD group had the lowest levels of functioning, and the WW group had the highest levels of functioning. Although younger and with higher literacy levels, the YA group had similar levels of pain interference and depressive symptoms to the SD group (p’s < .05). The YA group also had higher pain catastrophizing than the OA group (p < .005). Results highlighted the importance of the interactions between the multiple factors of socioeconomic status, age, and race in the experience of chronic pain. The intersectional identity theory approach through LCA provided an integrated picture of chronic pain disparities in a highly understudied and underserved population.

PMID:34252906 | DOI:10.1097/j.pain.0000000000002407

Categories
Nevin Manimala Statistics

scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling

Bioinformatics. 2021 Jul 12;37(Supplement_1):i358-i366. doi: 10.1093/bioinformatics/btab273.

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data.

RESULTS: Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data.

AVAILABILITY AND IMPLEMENTATION: The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252925 | DOI:10.1093/bioinformatics/btab273

Categories
Nevin Manimala Statistics

Environmental regulation and the growth of the total-factor carbon productivity of China’s industries: Evidence from the implementation of action plan of air pollution prevention and control

J Environ Manage. 2021 Jul 9;296:113078. doi: 10.1016/j.jenvman.2021.113078. Online ahead of print.

ABSTRACT

To abate the severe air pollution, the Chinese government released the Action Plan of Air Pollution Prevention and Control (APAPPC) in 2013. This paper regards the APAPPC as a quasi-experiment and uses the DID method to investigate the impact of environmental regulation on the growth of green total-factor productivity of China’s industries. This article employs the non-radial directional distance function (NDDF) and the global Malmquist index to measure the total-factor carbon productivity of China’s industries. Further regressions suggest that the implementation of the APAPPC has significantly promoted the growth of the total-factor carbon productivity in the air pollution-intensive industries, and its marginal effect has steadily increased with time. This result is still valid after using a series of counterfactual tests and robustness tests. The further mechanism study shows that the APAPPC has significantly promoted R&D investment, especially in instruments and equipment, which has effectively promoted technical efficiency and technological advancement. It indicates that stringent and well-designed environmental regulations should lead to a “win-win” situation of environmental improvement and economic development by encouraging enterprises to upgrade their technology and equipment.

PMID:34252855 | DOI:10.1016/j.jenvman.2021.113078