Categories
Nevin Manimala Statistics

CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data

Bioinformatics. 2021 Jul 12;37(Supplement_1):i51-i58. doi: 10.1093/bioinformatics/btab286.

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technology has been widely applied to capture the heterogeneity of different cell types within complex tissues. An essential step in scRNA-seq data analysis is the annotation of cell types. Traditional cell-type annotation is mainly clustering the cells first, and then using the aggregated cluster-level expression profiles and the marker genes to label each cluster. Such methods are greatly dependent on the clustering results, which are insufficient for accurate annotation.

RESULTS: In this article, we propose a semi-supervised learning method for cell-type annotation called CALLR. It combines unsupervised learning represented by the graph Laplacian matrix constructed from all the cells and supervised learning using sparse logistic regression. By alternately updating the cell clusters and annotation labels, high annotation accuracy can be achieved. The model is formulated as an optimization problem, and a computationally efficient algorithm is developed to solve it. Experiments on 10 real datasets show that CALLR outperforms the compared (semi-)supervised learning methods, and the popular clustering methods.

AVAILABILITY AND IMPLEMENTATION: The implementation of CALLR is available at https://github.com/MathSZhang/CALLR.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252936 | DOI:10.1093/bioinformatics/btab286

Categories
Nevin Manimala Statistics

Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited

Bioinformatics. 2021 Jul 12;37(Supplement_1):i111-i119. doi: 10.1093/bioinformatics/btab263.

ABSTRACT

MOTIVATION: The standard bootstrap method is used throughout science and engineering to perform general-purpose non-parametric resampling and re-estimation. Among the most widely cited and widely used such applications is the phylogenetic bootstrap method, which Felsenstein proposed in 1985 as a means to place statistical confidence intervals on an estimated phylogeny (or estimate ‘phylogenetic support’). A key simplifying assumption of the bootstrap method is that input data are independent and identically distributed (i.i.d.). However, the i.i.d. assumption is an over-simplification for biomolecular sequence analysis, as Felsenstein noted.

RESULTS: In this study, we introduce a new sequence-aware non-parametric resampling technique, which we refer to as RAWR (‘RAndom Walk Resampling’). RAWR consists of random walks that synthesize and extend the standard bootstrap method and the ‘mirrored inputs’ idea of Landan and Graur. We apply RAWR to the task of phylogenetic support estimation. RAWR’s performance is compared to the state-of-the-art using synthetic and empirical data that span a range of dataset sizes and evolutionary divergence. We show that RAWR support estimates offer comparable or typically superior type I and type II error compared to phylogenetic bootstrap support. We also conduct a re-analysis of large-scale genomic sequence data from a recent study of Darwin’s finches. Our findings clarify phylogenetic uncertainty in a charismatic clade that serves as an important model for complex adaptive evolution.

AVAILABILITY AND IMPLEMENTATION: Data and software are publicly available under open-source software and open data licenses at: https://gitlab.msu.edu/liulab/RAWR-study-datasets-and-scripts.

PMID:34252944 | DOI:10.1093/bioinformatics/btab263

Categories
Nevin Manimala Statistics

Predicting MHC-peptide binding affinity by differential boundary tree

Bioinformatics. 2021 Jul 12;37(Supplement_1):i254-i261. doi: 10.1093/bioinformatics/btab312.

ABSTRACT

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods.

RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity.

AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252932 | DOI:10.1093/bioinformatics/btab312

Categories
Nevin Manimala Statistics

‘Single-subject studies’-derived analyses unveil altered biomechanisms between very small cohorts: implications for rare diseases

Bioinformatics. 2021 Jul 12;37(Supplement_1):i67-i75. doi: 10.1093/bioinformatics/btab290.

ABSTRACT

MOTIVATION: Identifying altered transcripts between very small human cohorts is particularly challenging and is compounded by the low accrual rate of human subjects in rare diseases or sub-stratified common disorders. Yet, single-subject studies (S3) can compare paired transcriptome samples drawn from the same patient under two conditions (e.g. treated versus pre-treatment) and suggest patient-specific responsive biomechanisms based on the overrepresentation of functionally defined gene sets. These improve statistical power by: (i) reducing the total features tested and (ii) relaxing the requirement of within-cohort uniformity at the transcript level. We propose Inter-N-of-1, a novel method, to identify meaningful differences between very small cohorts by using the effect size of ‘single-subject-study’-derived responsive biological mechanisms.

RESULTS: In each subject, Inter-N-of-1 requires applying previously published S3-type N-of-1-pathways MixEnrich to two paired samples (e.g. diseased versus unaffected tissues) for determining patient-specific enriched genes sets: Odds Ratios (S3-OR) and S3-variance using Gene Ontology Biological Processes. To evaluate small cohorts, we calculated the precision and recall of Inter-N-of-1 and that of a control method (GLM+EGS) when comparing two cohorts of decreasing sizes (from 20 versus 20 to 2 versus 2) in a comprehensive six-parameter simulation and in a proof-of-concept clinical dataset. In simulations, the Inter-N-of-1 median precision and recall are > 90% and >75% in cohorts of 3 versus 3 distinct subjects (regardless of the parameter values), whereas conventional methods outperform Inter-N-of-1 at sample sizes 9 versus 9 and larger. Similar results were obtained in the clinical proof-of-concept dataset.

AVAILABILITY AND IMPLEMENTATION: R software is available at Lussierlab.net/BSSD.

PMID:34252934 | DOI:10.1093/bioinformatics/btab290

Categories
Nevin Manimala Statistics

An intersectional identity approach to chronic pain disparities using latent class analysis

Pain. 2021 Jul 9. doi: 10.1097/j.pain.0000000000002407. Online ahead of print.

ABSTRACT

Research on intersectionality and chronic pain disparities is very limited. Intersectionality explores the interconnections between multiple aspects of identity and provides a more accurate picture of disparities. This study applied a relatively novel statistical approach (i.e., Latent Class Analysis; LCA) to examine chronic pain disparities with an intersectional identity approach. Cross-sectional data were analyzed using pre-treatment data from the Learning About My Pain (LAMP) trial, a randomized comparative effectiveness study of group-based psychosocial interventions (PCORI Contract #941, Beverly Thorn, PI; clinicaltrials.gov identifier NCT01967342) for patients receiving care for chronic pain at low-income clinics in rural and suburban Alabama. LCA results suggested a 5-class model. In order to easily identify each class, the following labels were created: Older Adults (OA), Younger Adults (YA), Severe Disparity (SD), Older/Black/African-American (OB), and Working Women (WW). The latent disparity classes varied by pre-treatment chronic pain functioning. Overall, the SD group had the lowest levels of functioning, and the WW group had the highest levels of functioning. Although younger and with higher literacy levels, the YA group had similar levels of pain interference and depressive symptoms to the SD group (p’s < .05). The YA group also had higher pain catastrophizing than the OA group (p < .005). Results highlighted the importance of the interactions between the multiple factors of socioeconomic status, age, and race in the experience of chronic pain. The intersectional identity theory approach through LCA provided an integrated picture of chronic pain disparities in a highly understudied and underserved population.

PMID:34252906 | DOI:10.1097/j.pain.0000000000002407

Categories
Nevin Manimala Statistics

scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling

Bioinformatics. 2021 Jul 12;37(Supplement_1):i358-i366. doi: 10.1093/bioinformatics/btab273.

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data.

RESULTS: Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data.

AVAILABILITY AND IMPLEMENTATION: The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34252925 | DOI:10.1093/bioinformatics/btab273

Categories
Nevin Manimala Statistics

Environmental regulation and the growth of the total-factor carbon productivity of China’s industries: Evidence from the implementation of action plan of air pollution prevention and control

J Environ Manage. 2021 Jul 9;296:113078. doi: 10.1016/j.jenvman.2021.113078. Online ahead of print.

ABSTRACT

To abate the severe air pollution, the Chinese government released the Action Plan of Air Pollution Prevention and Control (APAPPC) in 2013. This paper regards the APAPPC as a quasi-experiment and uses the DID method to investigate the impact of environmental regulation on the growth of green total-factor productivity of China’s industries. This article employs the non-radial directional distance function (NDDF) and the global Malmquist index to measure the total-factor carbon productivity of China’s industries. Further regressions suggest that the implementation of the APAPPC has significantly promoted the growth of the total-factor carbon productivity in the air pollution-intensive industries, and its marginal effect has steadily increased with time. This result is still valid after using a series of counterfactual tests and robustness tests. The further mechanism study shows that the APAPPC has significantly promoted R&D investment, especially in instruments and equipment, which has effectively promoted technical efficiency and technological advancement. It indicates that stringent and well-designed environmental regulations should lead to a “win-win” situation of environmental improvement and economic development by encouraging enterprises to upgrade their technology and equipment.

PMID:34252855 | DOI:10.1016/j.jenvman.2021.113078

Categories
Nevin Manimala Statistics

Correlation of CT findings with intra-operative outcome in closed-loop small bowel obstruction (CL-SBO)

Eur J Radiol. 2021 Jul 6;142:109844. doi: 10.1016/j.ejrad.2021.109844. Online ahead of print.

ABSTRACT

PURPOSE: To correlate CT-findings in patients with closed-loop small bowel obstruction (CL-SBO) with perioperative findings, to identify patients who require immediate surgical intervention. Secondary purpose was to substantiate the role of radiologists in predicting perioperative outcome.

METHODS: Data were retrospectively obtained from patients with surgically confirmed CL-SBO, between September 2013 and September 2019. Three radiologists reviewed CTs to assess defined CT features and predict patient outcome for bowel wall ischemia and necrosis using a likelihood score. Univariate statistical analyses were performed and diagnostic performance parameters and interobserver agreement were assessed for each feature.

RESULTS: Of 148 included patients, 28 (19%) intraoperatively had viable bowel and 120 (81%) had bowel wall ischemia or necrosis. Most CT characteristics, as well as the likelihood of ischemia and necrosis, found fair or moderate multirater agreement. Increased attenuation of bowel wall and mesenteric vessels on non-contrast-enhanced CT had a specificity for bowel ischemia or necrosis of 100% (sensitivity respectively 48% (p < 0.001) and 21% (p = 0.09)). Mesenteric edema had high sensitivity for ischemia or necrosis (90%), but specificity of only 26% (p < 0.001). For mesenteric fluid, sensitivity was 60% and specificity 57% (p = 0.004). Decreased enhancement of bowel wall in both arterial and PV-phase showed significant correlation, respectively a sensitivity of 58% and 42%, and specificity of 88% and 79% (both p < 0.001). Likelihood of both ischemia and necrosis were significantly correlated with perioperative outcome (p < 0.001).

CONCLUSION: CT findings concerning mesenteric and bowel wall changes, as well as radiologists’ judgement of likelihood of ischemia and necrosis are significantly correlated with perioperative outcome of bowel wall ischemia and necrosis in patients with CL-SBO.

PMID:34252868 | DOI:10.1016/j.ejrad.2021.109844

Categories
Nevin Manimala Statistics

Uncertainty in tissue equivalent proportional counter assessments of microdosimetry and RBE estimates in carbon radiotherapy

Phys Med Biol. 2021 Jul 12. doi: 10.1088/1361-6560/ac1366. Online ahead of print.

ABSTRACT

Microdosimetry is an important tool for assessing energy deposition distributions from ionizing radiation at cellular and cellular nucleus scales. It has served as an input parameter for multiple common mathematical models, including evaluation of relative biological effectiveness (RBE) of carbon ion therapy. The most common detector used for microdosimetry is the tissue-equivalent proportional counter (TEPC). Although widely applied, the TEPC has various inherent uncertainties. Therefore, this work quantified the magnitude of TEPC measurement uncertainties and their impact on RBE estimates for therapeutic carbon beams. Microdosimetric spectra and frequency-, dose-, and saturation-corrected dose-mean lineal energy (yF,yD,y) were calculated using the Monte Carlo toolkit Geant4 for five monoenergetic and three SOBP carbon beams in water at every millimeter along the central beam axis. We simulated the following influences on these spectra from eight sources of uncertainty: wall effects, pulse pile-up, electronics, gas pressure, W-value, gain instability, low energy cut-off, and counting statistics. statistic uncertainty was quantified as the standard deviation of perturbed values for each source. Bias was quantified as the difference between default lineal energy values and the mean of perturbed values for each systematic source. Uncertainties were propagated to RBE using the modified microdosimetric kinetic model (MKM). Variance introduced by statistic sources inyFandyDaveraged 3.8% and 3.4%, respectively, and 1.5% inyacross beam depths and energies. Bias averaged 6.2% and 7.3% inyFandyD, and 4.8% iny. These uncertainties corresponded to 1.2±0.9% on average in RBEMKM. The largest contributors to variance and bias were pulse pile-up and wall effects. This study established an error budget for microdosimetric carbon measurements by quantifying uncertainty inherent to TEPC measurements. It is necessary to understand how robust the measurement of RBE model input parameters are against this uncertainty in order to verify clinical model implementation.

PMID:34252894 | DOI:10.1088/1361-6560/ac1366

Categories
Nevin Manimala Statistics

Bee pollen powder as a functional ingredient in frankfurters

Meat Sci. 2021 Jul 7;182:108621. doi: 10.1016/j.meatsci.2021.108621. Online ahead of print.

ABSTRACT

The objective of this study was to determine whether the addition of different pollen powder concentrations (0.0, 0.5, 1.0 and 1.5 g/100 g) to frankfurters had an influence on antioxidant potential and oxidative changes during storage, without detrimental effect on the quality of sausages. After cold storage of frankfurters, significant (P < 0.05) reductions of psychrotrophic bacteria populations were achieved with higher amounts of pollen (1.0 and 1.5 g/100 g). Good antioxidant properties and maintained TBARS values were accomplished by incorporating pollen into the frankfurters. In terms of quality parameters, statistically significant changes were obtained regarding the color, but sensory characteristics of the products were not disturbed. Also, the incorporation of pollen did not cause changes in terms of texture profile analyses of frankfurters. It can be concluded that the natural component, bee pollen powder, can be used as an antioxidant in frankfurter formulations, but further research is needed to estimate whether it can be an adequate replacement for synthetic antioxidants.

PMID:34252842 | DOI:10.1016/j.meatsci.2021.108621