Categories
Nevin Manimala Statistics

MetaZooGene Atlas and Database: Reference Sequences for Marine Ecosystems

Methods Mol Biol. 2024;2744:475-489. doi: 10.1007/978-1-0716-3581-0_28.

ABSTRACT

The MetaZooGene Atlas and Database (MZGdb; https://metazoogene.org/mzgdb/ ) is an open-access data and metadata portal synchronized with the NCBI GenBank and BOLD data repositories. The MZGdb includes sequences for genes used for the classification and identification of marine organisms based on DNA barcoding and metabarcoding. The focus of the MZGdb is biodiversity of marine ecosystems, including phytoplankton and microbes, zooplankton and invertebrates, fish, and other marine vertebrates (pinnipeds, cetaceans, and sea turtles). DNA sequences currently included are mitochondrial cytochrome oxidase I (COI), 12S, and 16S rRNA, and nuclear 18S and 28S rRNA. The MZGdb provides data and mapping tools for assembling and downloading compilations of reference sequence data that are specific to selected genes, taxonomic groups, and/or ocean regions. An additional feature of the MZGdb is the Atlas which summarizes data coverage and proportional completeness based on statistics of species with available sequences versus species commonly found in each ocean region.This chapter is a collaborative effort of the Scientific Committee for Ocean Research (SCOR) Working Group WG157: MetaZooGene: Toward a new global view of marine zooplankton biodiversity based on DNA metabarcoding and reference DNA sequence databases ( https://metazoogene.org ).

PMID:38683336 | DOI:10.1007/978-1-0716-3581-0_28

Categories
Nevin Manimala Statistics

Species Delimitation and Exploration of Species Partitions with ASAP and LIMES

Methods Mol Biol. 2024;2744:313-334. doi: 10.1007/978-1-0716-3581-0_20.

ABSTRACT

DNA barcoding plays an important role in exploring undescribed biodiversity and is increasingly used to delimit lineages at the species level (see Chap. 4 by Miralles et al.). Although several approaches and programs have been developed to perform species delimitation from datasets of single-locus DNA sequences, such as DNA barcodes, most of these were not initially provided as user-friendly GUI-driven executables. In spite of their differences, most of these tools share the same goal, i.e., inferring de novo a partition of subsets, potentially each representing a distinct species. More recently, a proposed common exchange format for the resulting species partitions (SPART) has been implemented by several of these tools, paving the way toward developing an interoperable digital environment entirely dedicated to integrative and comparative species delimitation. In this chapter, we provide detailed protocols for the use of two bioinformatic tools, one for single locus molecular species delimitation (ASAP) and one for statistical comparison of species partitions resulting from any kind of species delimitation analyses (LIMES).

PMID:38683328 | DOI:10.1007/978-1-0716-3581-0_20

Categories
Nevin Manimala Statistics

iTaxoTools 1.0: Improved DNA Barcode Exploration with TaxI2

Methods Mol Biol. 2024;2744:281-296. doi: 10.1007/978-1-0716-3581-0_18.

ABSTRACT

The overall availability of user-friendly software tools tailored to the analysis of DNA barcodes is limited. Several obvious functions such as detecting and visualizing the DNA barcode gap, the calculation of matrices of pairwise distances at the level of species, or the filtering and decontaminating of sets of sequences based on comparisons with reference databases can typically be carried out only by complex procedures that involve various programs and/or a substantial manual work of formatting. The iTaxoTools project aims at contributing user-friendly software solutions to improve the speed and quality of the workflow of alpha-taxonomy. In this chapter, we provide detailed protocols for the use of a substantially improved version of the tool TaxI2 for distance-based exploration of DNA barcodes. The program calculates genetic distances from prealigned data sets, or based on pairwise alignments, or with an alignment-free approach. Sequence and metadata input can be formatted as tab-delimited files and TaxI2 then computes tables, matrices and graphs of distances, and distance summary statistics within and between species and genera. TaxI2 also includes modes to compare a set of sequences against one or two reference data sets and output lists of best matches or filter data according to thresholds or reciprocal matches. Here, detailed step-by-step protocols are provided for the use of TaxI2, as well as for the interpretation of the program’s output.

PMID:38683326 | DOI:10.1007/978-1-0716-3581-0_18

Categories
Nevin Manimala Statistics

Automated Clinical Trial Cohort Definition and Evaluation with CQL and CDS-Hooks

Stud Health Technol Inform. 2024 Apr 26;313:149-155. doi: 10.3233/SHTI240028.

ABSTRACT

BACKGROUND: Patient recruitment for clinical trials faces major challenges with current methods being costly and often requiring time-consuming acquisition of medical histories and manual matching of potential subjects.

OBJECTIVES: Designing and implementing an Electronic Health Record (EHR) and domain-independent automation architecture using Clinical Decision Support (CDS) standards that allows researchers to effortlessly enter standardized trial criteria to retrieve eligibility statistics and integration into a clinician workflow to automatically trigger evaluation without added clinician workload.

METHODS: Cohort criteria are translated into the Clinical Quality Language (CQL) and integrated into Measures and CDS-Hooks for patient- and population-level evaluation.

RESULTS: Successful application of simplified real-world trial criteria to Fast Healthcare Interoperability Resources (FHIR®) test data shows the feasibility of obtaining individual patient eligibility and trial details as well as population eligibility statistics and a list of qualifying patients.

CONCLUSION: Employing CDS standards for automating cohort definition and evaluation shows promise in streamlining patient selection, aligning with increasing legislative demands for standardized healthcare data.

PMID:38682521 | DOI:10.3233/SHTI240028

Categories
Nevin Manimala Statistics

Mapping the Bulgarian Diabetes Register to OMOP CDM: Application Results

Stud Health Technol Inform. 2024 Apr 26;313:28-33. doi: 10.3233/SHTI240007.

ABSTRACT

BACKGROUND: The Bulgaria Diabetes Register (BDR) contains more than 380 millions of pseudonymized outpatient records with proprietary data structures and format.

OBJECTIVES: This paper presents the application results and experience acquired during the process of mapping such observational health data to OMOP CDM with the objective of publishing it in the European Health Data and Evidence Network (EHDEN) Portal.

METHODS: The data mapping follows the activities of the well-structured Extract-Transform-Load process. Unlike other publications, we focus on the need for preprocessing the data structures of raw data, cleaning data and procedures for assuring quality of data.

RESULTS: This paper provides quantitative and statistical measures for the records in the CDM database as published in the EHDEN Portal.

CONCLUSION: The mapping of data from the BDR to OMOP CDM provides the EHDEN community with opportunities for including these data in large-scale project for evidence generation by applying standard analytical tools.

PMID:38682500 | DOI:10.3233/SHTI240007

Categories
Nevin Manimala Statistics

Persistence to statin treatment: A cohort study in Lithuania

Basic Clin Pharmacol Toxicol. 2024 Apr 29. doi: 10.1111/bcpt.14015. Online ahead of print.

ABSTRACT

Cardiovascular diseases are the main causes of death, and statins can reduce the risk of major vascular events. Lithuania is among the European countries with the highest cardiovascular mortality despite a rapidly increasing use of statins. Previous reviews have shown the problem of poor patient adherence, but there are limited studies from Eastern European countries. The aim of this study was to evaluate treatment persistence in new users of statins in Lithuania and to investigate factors associated with persistence. Dispensed prescriptions from patients aged >18 years old initiated on statins in 2018-2019 were included, and data were obtained from a national health insurance fund. Persistence was assessed by the proportion of patients who still had statins dispensed 1 year after the first dispensing. Factors associated with persistence were assessed using logistic regression. A total of 104 726 patients (41.3% men) were initiated on statin treatment. Only 41% of them continued statin use 1 year after initiation. Factors associated with higher persistence rate were older age, higher dose of statin, use of other medicines and use of statins as secondary prevention. Low persistence to statin therapy needs to be recognized by healthcare workers, pharmacists and policy makers to address this problem.

PMID:38682475 | DOI:10.1111/bcpt.14015

Categories
Nevin Manimala Statistics

Impact of Air Pollutants on Outpatient Visits for Pediatric Atopic Dermatitis in Lanzhou

Zhongguo Yi Xue Ke Xue Yuan Xue Bao. 2024 Apr 29. doi: 10.3881/j.issn.1000-503X.15902. Online ahead of print.

ABSTRACT

Objective To analyze the correlation between air pollutants and pediatric atopic dermatitis outpatient visits in Lanzhou,and provide scientific insights for the life guidance of the affected children and disease prevention by relevant departments. Methods A generalized additive model was employed to analyze the effects and lagged effects of air pollutants on pediatric atopic dermatitis outpatient visits in Lanzhou while controlling for confounding factors such as long-term trends,holiday effects,day of the week effects and meteorological factors. Results The effects of NO2,PM2.5,PM10,and SO2 on pediatric atopic dermatitis outpatient visits were most significant on the current day(Lag0),but were not statistically significant (all P>0.05);CO also had the most significant effect on Lag0,and for every 10 μg/m3 increase in its concentration,the excess risk (ER) for pediatric atopic dermatitis outpatient visits was 0.05% (95%CI=0-0.10%,P=0.049);and O3 exhibited the most significant effect on day 7 of the cumulative lag (Lag07),with a statistically significant increase in the ER for each 10 μg/m3 increase in its concentration of 7.40% (95%CI=5.31%-9.53%,P<0.001) for pediatric atopic dermatitis outpatient visits.Age stratification showed that children aged 0-3 years with atopic dermatitis were the most sensitive to CO,with an increased ER of 0.09% (95%CI=0.04%-0.15%,P<0.001) for every 10 μg/m3 increase in concentration,and children aged 7-14 years with atopic dermatitis were the most sensitive to O3,with an increased ER of 8.26% (95%CI=4.99%-11.64%,P<0.001) for every 10 μg/m3 increase in concentration.Seasonal stratification showed that CO exerted a stronger effect on pediatric atopic dermatitis outpatient visits in summer and fall,with ER values of 0.45% and 0.16% (both P<0.001),respectively,while O3 had a significant effect on outpatient visits in winter,with an ER value of 20.48% (P<0.001). Conclusion Elevated daily average concentrations of air pollutants CO and O3 in Lanzhou were positively correlated with the number of outpatient visits for atopic dermatitis in children,with significant seasonal effects and age-stratified sensitivities.

PMID:38682472 | DOI:10.3881/j.issn.1000-503X.15902

Categories
Nevin Manimala Statistics

High-dimensional covariate-augmented overdispersed poisson factor model

Biometrics. 2024 Mar 27;80(2):ujae031. doi: 10.1093/biomtc/ujae031.

ABSTRACT

The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.

PMID:38682464 | DOI:10.1093/biomtc/ujae031

Categories
Nevin Manimala Statistics

Topical hidden genome: discovering latent cancer mutational topics using a Bayesian multilevel context-learning approach

Biometrics. 2024 Mar 27;80(2):ujae030. doi: 10.1093/biomtc/ujae030.

ABSTRACT

Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic “hidden genome” model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.

PMID:38682463 | DOI:10.1093/biomtc/ujae030

Categories
Nevin Manimala Statistics

A prospective randomized comparative trial of pediatric C-MAC D-blade video laryngoscope with McCoy direct laryngoscope for intubation in children posted for elective surgical procedures under general anesthesia

Paediatr Anaesth. 2024 Apr 29. doi: 10.1111/pan.14911. Online ahead of print.

ABSTRACT

BACKGROUND: Pediatric airway management requires careful clinical evaluation and experienced execution due to anatomical, physiological, and developmental considerations. Video laryngoscopy in pediatric airways is a developing area of research, with recent data suggesting that video laryngoscopes are better than standard Macintosh blades. Specifically, there is a paucity of literature on the advantages of the C-MAC D-blade compared to the McCoy direct laryngoscope.

METHODS: After Ethics Committee approval, 70 American Society of Anesthesiologists physical status 1 and 2 children aged 4-12 years scheduled for elective surgery under general anesthesia were recruited. Patients were randomly allocated to intubation using a C-MAC video laryngoscope size 2 D-blade (Group 1) and a McCoy laryngoscope size 2 blade (Group 2). The Intubation Difficulty Scale (IDS) for ease of intubation was the primary outcome, while Cormack-Lehane grades, duration of laryngoscopy and intubation, hemodynamic responses, and incidence of any airway complications were secondary outcomes.

RESULTS: Both groups were comparable in terms of patient characteristics. The median (IQR) Intubation Difficulty Scale (IDS) score was better but was statistically nonsignificant with C-MAC (0 [0-0] vs. 0 [0-2], p = .055). The glottic views were superior (CL grade I in 32/35 vs. 23/35, p = .002), and the time to best glottic view (6 s [5-7] vs. 8.0 s [6-10], p = .006) was lesser in the C-MAC D-blade group while the total duration of intubation was comparable (20 s [16-22] vs. 18 s [15-22], p = .374). All the patients could be successfully intubated on the first attempt. None of the patients had any complications.

CONCLUSION: The C-MAC video laryngoscope size 2 D-blade provided faster and better glottic visualization but similar intubation difficulty compared to McCoy size 2 laryngoscope in children. The shorter time to achieve glottic view demonstrated with the C-MAC failed to translate into a shorter total duration of intubation when compared to the McCoy laryngoscope attributable to a pronounced curvature of the D-blade.

PMID:38682461 | DOI:10.1111/pan.14911