Categories
Nevin Manimala Statistics

scDoc: correcting drop-out events in single-cell RNA-seq data.

Related Articles

scDoc: correcting drop-out events in single-cell RNA-seq data.

Bioinformatics. 2020 May 04;:

Authors: Ran D, Zhang S, Lytal N, An L

Abstract
MOTIVATION: Single-cell RNA sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of “drop-out” events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this paper, we present a novel Single-Cell RNA-seq Drop-Out Correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells.
RESULTS: scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification, and differential expression detection in scRNA-seq data.
AVAILABILITY: R code is available at https://github.com/anlingUA/scDoc.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID: 32365169 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Prediction of attempted suicide in men and women with crack-cocaine use disorder in Brazil.

Related Articles

Prediction of attempted suicide in men and women with crack-cocaine use disorder in Brazil.

PLoS One. 2020;15(5):e0232242

Authors: Roglio VS, Borges EN, Rabelo-da-Ponte FD, Ornell F, Scherer JN, Schuch JB, Passos IC, Sanvicente-Vieira B, Grassi-Oliveira R, von Diemen L, Pechansky F, Kessler FHP

Abstract
BACKGROUND: Suicide is a severe health problem, with high rates in individuals with addiction. Considering the lack of studies exploring suicide predictors in this population, we aimed to investigate factors associated with attempted suicide in inpatients diagnosed with cocaine use disorder using two analytical approaches.
METHODS: This is a cross-sectional study using a secondary database with 247 men and 442 women hospitalized for cocaine use disorder. Clinical assessment included the Addiction Severity Index, the Childhood Trauma Questionnaire, and the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, totalling 58 variables. Descriptive Poisson regression and predictive Random Forest algorithm were used complementarily to estimate prevalence ratios and to build prediction models, respectively. All analyses were stratified by gender.
RESULTS: The prevalence of attempted suicide was 34% for men and 50% for women. In both genders, depression (PRM = 1.56, PRW = 1.27) and hallucinations (PRM = 1.80, PRW = 1.39) were factors associated with attempted suicide. Other specific factors were found for men and women, such as childhood trauma, aggression, and drug use severity. The men’s predictive model had prediction statistics of AUC = 0.68, Acc. = 0.66, Sens. = 0.82, Spec. = 0.50, PPV = 0.47 and NPV = 0.84. This model identified several variables as important predictors, mainly related to drug use severity. The women’s model had higher predictive power (AUC = 0.73 and all other statistics were equal to 0.71) and was parsimonious.
CONCLUSIONS: Our findings indicate that attempted suicide is associated with depression, hallucinations and childhood trauma in both genders. Also, it suggests that severity of drug use may be a moderator between predictors and suicide among men, while psychiatric issues shown to be more important for women.

PMID: 32365094 [PubMed – in process]

Categories
Nevin Manimala Statistics

A new linear regression-like residual for survival analysis, with application to genome wide association studies of time-to-event data.

Related Articles

A new linear regression-like residual for survival analysis, with application to genome wide association studies of time-to-event data.

PLoS One. 2020;15(5):e0232300

Authors: Vieland VJ, Seok SC, Stewart WCL

Abstract
In linear regression, a residual measures how far a subject’s observation is from expectation; in survival analysis, a subject’s Martingale or deviance residual is sometimes interpreted similarly. Here we consider ways in which a linear regression-like interpretation is not appropriate for Martingale and deviance residuals, and we develop a novel time-to-event residual which does have a linear regression-like interpretation. We illustrate the utility of this new residual via simulation of a time-to-event genome-wide association study, motivated by a real study seeking genetic modifiers of Duchenne Muscular Dystrophy. By virtue of its linear regression-like characteristics, our new residual may prove useful in other contexts as well.

PMID: 32365095 [PubMed – in process]

Categories
Nevin Manimala Statistics

Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models.

Related Articles

Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models.

PLoS Genet. 2020 May 04;16(5):e1008766

Authors: Bhatnagar SR, Yang Y, Lu T, Schurr E, Loredo-Osti JC, Forest M, Oualkacha K, Greenwood CMT

Abstract
Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).

PMID: 32365090 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis.

Related Articles

Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis.

PLoS Comput Biol. 2020 May 04;16(5):e1007797

Authors: Brucker A, Lu W, Marceau West R, Yu QY, Hsiao CK, Hsiao TH, Lin CH, Magnusson PKE, Sullivan PF, Szatkiewicz JP, Lu TP, Tzeng JY

Abstract
Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals’ copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of “copy number profile curves” to describe the CNV profile of an individual, and the “common area under the curve (cAUC) kernel” to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.

PMID: 32365089 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Integration of cortical thickness data in a statistical shape model of the scapula.

Related Articles

Integration of cortical thickness data in a statistical shape model of the scapula.

Comput Methods Biomech Biomed Engin. 2020 May 04;:1-7

Authors: Pitocchi J, Wirix-Speetjens R, van Lenthe GH, Pérez MÁ

Abstract
Knowledge about bone morphology and bone quality of the scapula throughout the population is fundamental in the design of shoulder implants. In particular, regions with the best bone stock (cortical bone) are taken into account when planning the supporting screws, aiming for an optimal fixation. As an alternative to manual measurements, statistical shape models (SSMs) have been commonly used to describe shape variability within a population. However, explicitly including cortical thickness information in an SSM of the scapula still remains a challenge. Therefore, the goal of this study is to combine scapular bone shape and cortex morphology in an SSM. First, a method to estimate cortical thickness, based on HU (Hounsfield Unit) profile analysis, was developed and validated. Then, based on the manual segmentations of 32 healthy scapulae, a statistical shape model including cortical information was created and evaluated. Generalization, specificity and compactness were calculated in order to assess the quality of the SSM. The average cortical thickness of the SSM was 2.0 ± 0.63 mm. Generalization, specificity and compactness performances confirmed that the combined SSM was able to capture the bone quality changes in the population. In this work we integrated information on the cortical thickness in an SSM for the scapula. From the results we conclude that this methodology is a valuable tool for automatically generating a large population of scapulae and deducing statistics on the cortex. Hence, this SSM can be useful to automate implant design and screw placement in shoulder arthroplasty.

PMID: 32364819 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Effect of acclimated temperature on thermal tolerance, immune response and expression of HSP genes in Labeo rohita, Catla catla and their intergeneric hybrids.

Related Articles

Effect of acclimated temperature on thermal tolerance, immune response and expression of HSP genes in Labeo rohita, Catla catla and their intergeneric hybrids.

J Therm Biol. 2020 Apr;89:102570

Authors: Ahmad M, Zuberi A, Ali M, Syed A, Murtaza MUH, Khan A, Kamran M

Abstract
The ability of a species and population to respond to a decrease or an increase in temperature depends on their adaptive potential. Here, the critical thermal tolerance (CTmax and CTmin) of four populations: Labeo rohita, Catla catla, and their reciprocal hybrids L. rohita♀× C. catla♂ (RC) and C. catla♀ × L. rohita♂ (CR) being acclimatized at four acclimation temperatures (22, 26, 30 and 34 °C) were determined. All populations indicated substantial variations (P < 0.05) in CTmax and CTmin values. L. rohita displayed, comparatively the highest CTmax with largest total and intrinsic polygon zones as well as the upper and lower acquired thermal tolerance zones followed by RC and CR hybrids, while C. catla showed significantly the highest CTmin value and the smallest intrinsic and acquired thermal tolerance zones. Both hybrids illustrated low parent heterosis (≤11%). Additionally, the highest expression of Hsp70 and Hsp90 (heat shock proteins) genes, serum lysozyme level, respiratory burst activity and lowest lipid peroxidation level under lower and higher temperature shock further illustrated strong physiological mechanism of L. rohita in contrast to C. catla, to deal with acute temperature, while hybrids, especially F1 RC hybrid appeared as a good option to replace C. catla in relatively higher and lower temperature areas.

PMID: 32364999 [PubMed – in process]

Categories
Nevin Manimala Statistics

Agent-based and continuous models of hopper bands for the Australian plague locust: How resource consumption mediates pulse formation and geometry.

Related Articles

Agent-based and continuous models of hopper bands for the Australian plague locust: How resource consumption mediates pulse formation and geometry.

PLoS Comput Biol. 2020 May 04;16(5):e1007820

Authors: Bernoff AJ, Culshaw-Maurer M, Everett RA, Hohn ME, Strickland WC, Weinburd J

Abstract
Locusts are significant agricultural pests. Under favorable environmental conditions flightless juveniles may aggregate into coherent, aligned swarms referred to as hopper bands. These bands are often observed as a propagating wave having a dense front with rapidly decreasing density in the wake. A tantalizing and common observation is that these fronts slow and steepen in the presence of green vegetation. This suggests the collective motion of the band is mediated by resource consumption. Our goal is to model and quantify this effect. We focus on the Australian plague locust, for which excellent field and experimental data is available. Exploiting the alignment of locusts in hopper bands, we concentrate solely on the density variation perpendicular to the front. We develop two models in tandem; an agent-based model that tracks the position of individuals and a partial differential equation model that describes locust density. In both these models, locust are either stationary (and feeding) or moving. Resources decrease with feeding. The rate at which locusts transition between moving and stationary (and vice versa) is enhanced (diminished) by resource abundance. This effect proves essential to the formation, shape, and speed of locust hopper bands in our models. From the biological literature we estimate ranges for the ten input parameters of our models. Sobol sensitivity analysis yields insight into how the band’s collective characteristics vary with changes in the input parameters. By examining 4.4 million parameter combinations, we identify biologically consistent parameters that reproduce field observations. We thus demonstrate that resource-dependent behavior can explain the density distribution observed in locust hopper bands. This work suggests that feeding behaviors should be an intrinsic part of future modeling efforts.

PMID: 32365072 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain.

Related Articles

Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain.

Hum Brain Mapp. 2020 May 04;:

Authors: Sripada C, Angstadt M, Rutherford S, Taxali A, Shedden K

Abstract
General cognitive ability (GCA) refers to a trait-like ability that contributes to performance across diverse cognitive tasks. Identifying brain-based markers of GCA has been a longstanding goal of cognitive and clinical neuroscience. Recently, predictive modeling methods have emerged that build whole-brain, distributed neural signatures for phenotypes of interest. In this study, we employ a predictive modeling approach to predict GCA based on fMRI task activation patterns during the N-back working memory task as well as six other tasks in the Human Connectome Project dataset (n = 967), encompassing 15 task contrasts in total. We found tasks are a highly effective basis for prediction of GCA: The 2-back versus 0-back contrast achieved a 0.50 correlation with GCA scores in 10-fold cross-validation, and 13 out of 15 task contrasts afforded statistically significant prediction of GCA. Additionally, we found that task contrasts that produce greater frontoparietal activation and default mode network deactivation-a brain activation pattern associated with executive processing and higher cognitive demand-are more effective in the prediction of GCA. These results suggest a picture analogous to treadmill testing for cardiac function: Placing the brain in a more cognitively demanding task state significantly improves brain-based prediction of GCA.

PMID: 32364670 [PubMed – as supplied by publisher]

Categories
Nevin Manimala Statistics

Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus.

Related Articles

Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus.

JCO Clin Cancer Inform. 2020 May;4:383-391

Authors: Li Y, Luo YH, Wampfler JA, Rubinstein SM, Tiryaki F, Ashok K, Warner JL, Xu H, Yang P

Abstract
PURPOSE: Electronic health records (EHRs) are created primarily for nonresearch purposes; thus, the amounts of data are enormous, and the data are crude, heterogeneous, incomplete, and largely unstructured, presenting challenges to effective analyses for timely, reliable results. Particularly, research dealing with clinical notes relevant to patient care and outcome is seldom conducted, due to the complexity of data extraction and accurate annotation in the past. RECIST is a set of widely accepted research criteria to evaluate tumor response in patients undergoing antineoplastic therapy. The aim for this study was to identify textual sources for RECIST information in EHRs and to develop a corpus of pharmacotherapy and response entities for development of natural language processing tools.
METHODS: We focused on pharmacotherapies and patient responses, using 55,120 medical notes (n = 72 types) in Mayo Clinic’s EHRs from 622 randomly selected patients who signed authorization for research. Using the Multidocument Annotation Environment tool, we applied and evaluated predefined keywords, and time interval and note-type filters for identifying RECIST information and established a gold standard data set for patient outcome research.
RESULTS: Key words reduced clinical notes to 37,406, and using four note types within 12 months postdiagnosis further reduced the number of notes to 5,005 that were manually annotated, which covered 97.9% of all cases (n = 609 of 622). The resulting data set of 609 cases (n = 503 for training and n = 106 for validation purpose), contains 736 fully annotated, deidentified clinical notes, with pharmacotherapies and four response end points: complete response, partial response, stable disease, and progressive disease. This resource is readily expandable to specific drugs, regimens, and most solid tumors.
CONCLUSION: We have established a gold standard data set to accommodate development of biomedical informatics tools in accelerating research into antineoplastic therapeutic response.

PMID: 32364754 [PubMed – in process]