Categories
Nevin Manimala Statistics

Measuring Phylogenetic Information of Incomplete Sequence Data

Syst Biol. 2021 Sep 1:syab073. doi: 10.1093/sysbio/syab073. Online ahead of print.

ABSTRACT

Widely used approaches for extracting phylogenetic information from aligned sets of molecular sequences rely upon probabilistic models of nucleotide substitution or amino-acid replacement. The phylogenetic information that can be extracted depends on the number of columns in the sequence alignment and will be decreased when the alignment contains gaps due to insertion or deletion events. Motivated by the measurement of information loss, we suggest assessment of the Effective Sequence Length (ESL) of an aligned data set. The ESL can differ from the actual number of columns in a sequence alignment because of the presence of alignment gaps. Furthermore, the estimation of phylogenetic information is affected by model misspecification. Inevitably, the actual process of molecular evolution differs from the probabilistic models employed to describe this process. This disparity means the amount of phylogenetic information in an actual sequence alignment will differ from the amount in a simulated data set of equal size, which motivated us to develop a new test for model adequacy. Via theory and empirical data analysis, we show how to disentangle the effects of gaps and model misspecification. By comparing the Fisher information of actual and simulated sequences, we identify which alignment sites and tree branches are most affected by gaps and model misspecification.

PMID:34469581 | DOI:10.1093/sysbio/syab073

Categories
Nevin Manimala Statistics

Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle

Bioinformatics. 2021 Sep 1:btab629. doi: 10.1093/bioinformatics/btab629. Online ahead of print.

ABSTRACT

MOTIVATION: Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification.

RESULTS: We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers.

AVAILABILITY: The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:34469510 | DOI:10.1093/bioinformatics/btab629

Categories
Nevin Manimala Statistics

A probabilistic model for indel evolution: differentiating insertions from deletions

Mol Biol Evol. 2021 Sep 1:msab266. doi: 10.1093/molbev/msab266. Online ahead of print.

ABSTRACT

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here we introduce several improvements to indel modeling: (1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; (2) We introduce numerous summary statistics that allow Approximate Bayesian Computation (ABC) based parameter estimation; (3) We develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical datasets; (4) Using a model-selection scheme we test whether the richer model better fits biological data compared to the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical datasets and that, for the majority of these datasets, the deletion rate is higher than the insertion rate.

PMID:34469521 | DOI:10.1093/molbev/msab266

Categories
Nevin Manimala Statistics

Measuring customer satisfaction on the cleanliness of food premises using fuzzy conjoint analysis: A pilot test

PLoS One. 2021 Sep 1;16(9):e0256896. doi: 10.1371/journal.pone.0256896. eCollection 2021.

ABSTRACT

Determining the level of customer satisfaction in cleanliness regarding a product or service is a significant aspect of businesses. However, the availability of feedback tools for consumers to evaluate the cleanliness of a restaurant is a crucial issue as several aspects of cleanliness need to be evaluated collectively. To overcome this issue, this study designed a survey instrument based on the standard form used for grading the food premises and transformed it into a seven Likert scale questionnaire and consists of seven questions. This study employed fuzzy conjoint analysis to measure the level of satisfaction in cleanliness in food premises. This pilot study recruited 30 students in Universiti Teknologi MARA (UiTM) Seremban 3. The student’s perception was represented by the scores calculated based on their degree of similarities and corresponding levels of satisfaction, whereby, only scores with the highest degree of similarity were selected. Furthermore, this study identified the aspects of hygiene that assessed based on the customers’ satisfaction upon visiting the premises. The results indicated that the fuzzy conjoint analysis produced a similar outcome as the statistical mean, thus, was useful for the evaluation of customer satisfaction on the cleanliness of food premises.

PMID:34469489 | DOI:10.1371/journal.pone.0256896

Categories
Nevin Manimala Statistics

Correction: A statistical test and sample size recommendations for comparing community composition following PCA

PLoS One. 2021 Sep 1;16(9):e0257146. doi: 10.1371/journal.pone.0257146. eCollection 2021.

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0206033.].

PMID:34469490 | DOI:10.1371/journal.pone.0257146

Categories
Nevin Manimala Statistics

Effectiveness of blended learning in pharmacy education: An experimental study using clinical research modules

PLoS One. 2021 Sep 1;16(9):e0256814. doi: 10.1371/journal.pone.0256814. eCollection 2021.

ABSTRACT

BACKGROUND &OBJECTIVES: Though there are studies to evaluate the effectiveness of blended learning in pharmacy education, most of them originate from USA and have used previous year students’ scores as control. Also there is less research in comparing use of self -regulated learning strategies between blended and other learning strategies. Primary aim was to evaluate the effectiveness of blended learning on knowledge score using clinical research modules. Secondary objective was designed to compare the use of self-regulated learning strategies between blended learning, web-based e-learning and didactic teaching.

MATERIALS AND METHODS: A prospective cluster randomized trial was conducted with didactic teaching as control and web-based e-learning and blended learning as interventions. The target population was final year Pharm D students. Outcome was assessed using a validated knowledge questionnaire, a motivated strategies for learning questionnaire and a feedback form. All statistical analyses were carried out using Statistical Package for Social Science (SPSS) Version 20.

RESULTS: A total of 241 students from 12 colleges completed the study. Mean knowledge score of students in blended learning group was higher than those in the didactic teaching and web- based e- learning program (64.26±18.19 Vs 56.65±8.73 Vs 52.11±22.06,p<0.001).Frequency of use of learning strategies namely rehearsal, elaboration, organization and critical thinking was statistically significantly higher in the blended learning group compared to those of didactic and web-based e-learning group (p<0.05) But there were no statistically significant difference of motivational orientations between didactic and blended learning group except strategies of extrinsic goal orientation and self-efficacy. Students preferred blended learning (86.5%) over didactic and web-based e-learning.

CONCLUSION: Blended learning approach is an effective way to teach clinical research module. Students of blended learning group employed all motivational and learning strategies more often than students of the didactic and web- based e-learning groups except strategies of intrinsic goal orientation, task value, control of learning belief and help seeking.

PMID:34469484 | DOI:10.1371/journal.pone.0256814

Categories
Nevin Manimala Statistics

The effect of librarian involvement on the quality of systematic reviews in dental medicine

PLoS One. 2021 Sep 1;16(9):e0256833. doi: 10.1371/journal.pone.0256833. eCollection 2021.

ABSTRACT

OBJECTIVES: To determine whether librarian or information specialist authorship is associated with better reproducibility of the search, at least three databases searched, and better reporting quality in dental systematic reviews (SRs).

METHODS: SRs from the top ten dental research journals (as determined by Journal Citation Reports and Scimago) were reviewed for search quality and reproducibility by independent reviewers using two Qualtrics survey instruments. Data was reviewed for all SRs based on reproducibility and librarian participation and further reviewed for search quality of reproducible searches.

RESULTS: Librarians were co-authors in only 2.5% of the 913 included SRs and librarians were mentioned or acknowledged in only 9% of included SRs. Librarian coauthors were associated with more reproducible searches, higher search quality, and at least three databases searched. Although the results indicate librarians are associated with improved SR quality, due to the small number of SRs that included a librarian, results were not statistically significant.

CONCLUSION: Despite guidance from organizations that produce SR guidelines recommending the inclusion of a librarian or information specialist on the review team, and despite evidence showing that librarians improve the reproducibility of searches and the reporting of methodology in SRs, librarians are not being included in SRs in the field of dental medicine. The authors of this review recommend the inclusion of a librarian on SR teams in dental medicine and other fields.

PMID:34469487 | DOI:10.1371/journal.pone.0256833

Categories
Nevin Manimala Statistics

Automatic detection of adult cardiomyocyte for high throughput measurements of calcium and contractility

PLoS One. 2021 Sep 1;16(9):e0256713. doi: 10.1371/journal.pone.0256713. eCollection 2021.

ABSTRACT

Simultaneous calcium and contractility measurements on isolated adult cardiomyocytes have been the gold standard for the last decades to study cardiac (patho)physiology. However, the throughput of this system is low which limits the number of compounds that can be tested per animal. We developed instrumentation and software that can automatically find adult cardiomyocytes. Cells are detected based on the cell boundary using a Sobel-filter to find the edge information in the field of view. Separately, we detected motion by calculating the variance of intensity for each pixel in the frame through time. Additionally, it detects the best region for calcium and contractility measurements. A sensitivity of 0.66 ± 0.08 and a precision of 0.82 ± 0.03 was reached using our cell finding algorithm. The percentage of cells that were found and had good contractility measurements was 90 ± 10%. In addition, the average time between 2 cardiomyocyte calcium and contractility measurements decreased from 93.5 ± 80.2 to 15.6 ± 8.0 seconds using our software and microscope. This drastically increases throughput and provides a higher statistical reliability when performing adult cardiomyocyte functional experiments.

PMID:34469476 | DOI:10.1371/journal.pone.0256713

Categories
Nevin Manimala Statistics

Interpreting COVID-19 deaths among nursing home residents in the US: The changing role of facility quality over time

PLoS One. 2021 Sep 1;16(9):e0256767. doi: 10.1371/journal.pone.0256767. eCollection 2021.

ABSTRACT

A report published last year by the Centers for Medicare & Medicaid Services (CMS) highlighted that COVID-19 case counts are more likely to be high in lower quality nursing homes than in higher quality ones. Since then, multiple studies have examined this association with a handful also exploring the role of facility quality in explaining resident deaths from the virus. Despite this wide interest, no previous study has investigated how the relation between quality and COVID-19 mortality among nursing home residents may have changed, if at all, over the progression of the pandemic. This understanding is indeed lacking given that prior studies are either cross-sectional or are analyses limited to one specific state or region of the country. To address this gap, we analyzed changes in nursing home resident deaths across the US between June 1, 2020 and January 31, 2021 (n = 12,415 nursing homes X 8 months) using both descriptive and multivariable statistics. We merged publicly available data from multiple federal agencies with mortality rate (per 100,000 residents) as the outcome and CMS 5-star quality rating as the primary explanatory variable of interest. Covariates, based on the prior literature, consisted of both facility- and community-level characteristics. Findings from our secondary analysis provide robust evidence of the association between nursing home quality and resident deaths due to the virus diminishing over time. In connection, we discuss plausible reasons, especially duration of staff shortages, that over time might have played a critical role in driving the quality-mortality convergence across nursing homes in the US.

PMID:34469483 | DOI:10.1371/journal.pone.0256767

Categories
Nevin Manimala Statistics

Classifying patients with depressive and anxiety disorders according to symptom network structures: A Gaussian graphical mixture model-based clustering

PLoS One. 2021 Sep 1;16(9):e0256902. doi: 10.1371/journal.pone.0256902. eCollection 2021.

ABSTRACT

Patients with mental disorders often suffer from comorbidity. Transdiagnostic understandings of mental disorders are expected to provide more accurate and detailed descriptions of psychopathology and be helpful in developing efficient treatments. Although conventional clustering techniques, such as latent profile analysis, are useful for the taxonomy of psychopathology, they provide little implications for targeting specific symptoms in each cluster. To overcome these limitations, we introduced Gaussian graphical mixture model (GGMM)-based clustering, a method developed in mathematical statistics to integrate clustering and network statistical approaches. To illustrate the technical details and clinical utility of the analysis, we applied GGMM-based clustering to a Japanese sample of 1,521 patients (Mage = 42.42 years), who had diagnostic labels of major depressive disorder (MDD; n = 406), panic disorder (PD; n = 198), social anxiety disorder (SAD; n = 116), obsessive-compulsive disorder (OCD; n = 66), comorbid MDD and any anxiety disorder (n = 636), or comorbid anxiety disorders (n = 99). As a result, we identified the following four transdiagnostic clusters characterized by i) strong OCD and PD symptoms, and moderate MDD and SAD symptoms; ii) moderate MDD, PD, and SAD symptoms, and weak OCD symptoms; iii) weak symptoms of all four disorders; and iv) strong symptoms of all four disorders. Simultaneously, a covariance symptom network within each cluster was visualized. The discussion highlighted that the GGMM-based clusters help us generate clinical hypotheses for transdiagnostic clusters by enabling further investigations of each symptom network, such as the calculation of centrality indexes.

PMID:34469469 | DOI:10.1371/journal.pone.0256902