Categories
Nevin Manimala Statistics

Identifying DNA methylation types and methylated base positions from bacteria using nanopore sequencing with multi-scale neural network

Bioinformatics. 2025 Jul 14:btaf397. doi: 10.1093/bioinformatics/btaf397. Online ahead of print.

ABSTRACT

MOTIVATION: DNA methylation plays important roles in various cellular physiological processes in bacteria. Nanopore sequencing has shown the ability to identify different types of DNA methylation from individual bacteria directly. However, existing methods for identifying bacterial methylomes showed inconsistent performances in different methylation motifs in bacteria and didn’t fully utilize the different scale information contained in nanopore signals.

RESULTS: We propose a deep-learning method, called Nanoident, for de novo detection of DNA methylation types and methylated base positions in bacteria using Nanopore sequencing. For each targeted motif sequence, Nanoident utilizes five different features, including statistical features extracted from both the nanopore raw signals and the basecalling results of the motif. All the five features are processed by a multi-scale neural network in Nanoident, which extracts information from different receptive fields of the features. The LOOCV (Leave-One-Out Cross Validation) on the dataset containing 7 bacteria samples with 46 methylation motifs shows that, Nanoident achieves ∼10% improvement in accuracy than the previous method. Furthermore, Nanoident achieves ∼13% improvement in accuracy in an independent dataset, which contains 12 methylation motifs. Additionally, we optimize the pipeline for de novo methylation motif enrichment, enabling the discovery of novel methylation motifs.

AVAILABILITY AND IMPLEMENTATION: The source code of Nanoident is freely available at https://github.com/cz-csu/Nanoident and https://doi.org/10.6084/m9.figshare.29252264.

SUPPLEMENTARY INFORMATION: data are available at Bioinformatics online.

PMID:40658463 | DOI:10.1093/bioinformatics/btaf397

Categories
Nevin Manimala Statistics

Assessment of disease severity in Sjögren’s syndrome using semiquantitative parameters on salivary gland scintigraphy

Nucl Med Commun. 2025 Jul 4. doi: 10.1097/MNM.0000000000002020. Online ahead of print.

ABSTRACT

INTRODUCTION: Sjögren’s syndrome is a chronic autoimmune disease characterized by lymphocytic infiltration and destruction of exocrine glands. Sjögren’s syndrome characteristically involves salivary glands with the presence of xerostomia in the majority (>93%) of patients. The severity of xerostomia can vary from mild to severe and debilitating. Labial histopathology and antinuclear antibodies (ANA) are commonly used in the diagnosis of Sjögren’s syndrome but do not correlate well with disease severity. Tests available for objective assessment of disease severity include sialometry and salivary gland scintigraphy (SGS). This study aims to correlate the severity of xerostomia with semiquantitative parameters on SGS.

MATERIALS AND METHODS: On the basis of clinical symptoms, the severity of xerostomia was graded into mild, moderate, and severe. Semiquantitative parameters (maximum uptake and excretion fractions) for all salivary glands were calculated on SGS. Spearman’s correlation coefficients were calculated to assess correlation with clinical disease severity.

RESULTS: One-hundred thirteen patients (94 females and 19 males) with a median age of 39 years (range: 4-85 years) were included. Of these, 74 had mild, 28 had moderate, while only 11 had severe disease. There was a statistically significant difference between the mean values of maximum uptake and excretion fractions across the three severity groups (P < 0.05).

CONCLUSION: Semiquantitative parameters on SGS show a reduction with an increase in the severity of xerostomia. In addition, maximum uptake and excretion fractions correlated well with the severity of xerostomia of Sjögren’s syndrome, whereas ANA levels showed no significant correlation with disease severity. SGS can serve as an objective parameter of clinical severity of xerostomia, which is otherwise difficult to determine clinically.

PMID:40658462 | DOI:10.1097/MNM.0000000000002020

Categories
Nevin Manimala Statistics

Multivariate Adjustments for Average Equivalence Testing

Stat Med. 2025 Jul;44(15-17):e10258. doi: 10.1002/sim.10258.

ABSTRACT

Multivariate (average) equivalence testing is widely used to assess whether the means of two conditions of interest are “equivalent” for different outcomes simultaneously. In pharmacological research for example, many regulatory agencies require the generic product and its brand-name counterpart to have equivalent means both for the AUC and Cmax pharmacokinetics parameters. The multivariate Two One-Sided Tests (TOST) procedure is typically used in this context by checking if, outcome by outcome, the marginal 100 ( 1 2 α ) % $$ 100left(1-2alpha right)% $$ confidence intervals for the difference in means between the two conditions of interest lie within predefined lower and upper equivalence limits. This procedure, already known to be conservative in the univariate case, leads to a rapid power loss when the number of outcomes increases, especially when one or more outcome variances are relatively large. In this work, we propose a finite-sample adjustment for this procedure, the multivariate α $$ alpha $$ -TOST, that consists in a correction of α $$ alpha $$ , the significance level, taking the (arbitrary) dependence between the outcomes of interest into account and making it uniformly more powerful than the conventional multivariate TOST. We present an iterative algorithm allowing to efficiently define α * $$ {alpha}^{ast } $$ , the corrected significance level, a task that proves challenging in the multivariate setting due to the inter-relationship between α * $$ {alpha}^{ast } $$ and the sets of values belonging to the null hypothesis space and defining the test size. We study the operating characteristics of the multivariate α $$ alpha $$ -TOST both theoretically and via an extensive simulation study considering cases relevant for real-world analyses-that is, relatively small sample sizes, unknown and possibly heterogeneous variances as well as different correlation structures-and show the superior finite-sample properties of the multivariate α $$ alpha $$ -TOST compared to its conventional counterpart. We finally re-visit a case study on ticlopidine hydrochloride and compare both methods when simultaneously assessing bioequivalence for multiple pharmacokinetic parameters.

PMID:40658428 | DOI:10.1002/sim.10258

Categories
Nevin Manimala Statistics

Metataxonomic profiles of bacterial and parasitic communities in Amblyomma spp. ticks collected from wildlife in Colombia: Implications for tick-borne diseases

Med Vet Entomol. 2025 Jul 14. doi: 10.1111/mve.12823. Online ahead of print.

ABSTRACT

As a tropical country, Colombia hosts a wide range of arthropods that can act as vectors of disease-causing pathogens, particularly those carrying hemopathogens. Ticks play a crucial role in the transmission of zoonotic pathogens, impacting both human and veterinary health. The pathogen load of ticks from wildlife is of particular concern, as it can contribute to the spillover of infectious agents to domestic animals and humans, highlighting the need for surveillance and control strategies to mitigate emerging tick-borne diseases. Therefore, this study aimed to determine the presence of microorganisms in ticks collected from wildlife in Antioquia (Colombia) through bioinformatic analysis. A prospective, cross-sectional, random, non-probabilistic, convenience-based study involving tick collection from animals in three different zones of Antioquia was conducted. Initially, vertebrate species were morphologically characterized via taxonomic keys and identification guides for amphibians, reptiles, birds, and mammals. Ticks were manually collected from these animals and preserved in absolute ethanol for later taxonomic identification. Genomic DNA was then extracted, and the resulting reads were processed through bioinformatic analysis, achieving taxonomic classification within DNA libraries of gram-positive bacteria, gram-negative bacteria, and parasites. Additionally, descriptive statistics were calculated for all variables of interest at the animal level (e.g., genus, species, sex, and age group, when applicable) and study zone. A total of 570 ticks, predominantly Amblyomma spp., were obtained from 46 host animals. Ticks from lizards presented the highest bacterial richness and diversity (based on 16S gDNA), whereas ticks from amphibians presented the lowest. Proteobacteria dominated most samples, as shown by taxonomic composition at the phylum, family, and genus levels. Ticks collected from mammals displayed lower diversity and richness than those collected from reptiles. For parasitic communities (18S gDNA), dominant eukaryotes were identified in ticks from mammals, excluding host-related taxa. Overall, lizard-associated ticks presented the most complex microbial diversity, whereas amphibian ticks were less diverse, highlighting the significant variation in microbial and parasitic communities across host species. This study highlights the microbial diversity of ticks from wild hosts in Colombia, focusing on the dominance of Francisella, Rickettsia, Aspergillus, and Penicillium. These findings underscore the need for further research on their ecological roles, transmission dynamics, and potential health risks, aiming to inform strategies to mitigate tick-borne diseases.

PMID:40658399 | DOI:10.1111/mve.12823

Categories
Nevin Manimala Statistics

Comparison of Molecular Recognition in Docking Versus Experimental CSD and PDB Data

J Chem Inf Model. 2025 Jul 14. doi: 10.1021/acs.jcim.5c00893. Online ahead of print.

ABSTRACT

Molecular docking is a widely used technique in structure-based drug design for generating poses of small molecules in a protein receptor structure. These poses are then ranked to prioritize compounds for experimental validation. Numerous approaches to assessing the structural fit of a ligand exist, ranging from simple scoring functions to more elaborate free energy calculations. Regardless of the prioritization method chosen, its accuracy is limited by the quality of the protein-ligand pose. Here, we apply two established statistical approaches for quantifying atomic interaction preferences and torsional ligand strain, respectively, to compare poses generated by the docking algorithm Vina with crystallographic data from the PDB and CSD. This analysis allows us to identify potential deficiencies in the docking algorithm, such as underestimated electrostatic repulsion or high-energy hydroxyl conformations. By highlighting such inaccuracies, we aim to inspire improvements in future docking algorithms. Finally, a pose scoring approach is proposed that significantly improves the retrieval of the experimental pose from a set of docked poses.

PMID:40658398 | DOI:10.1021/acs.jcim.5c00893

Categories
Nevin Manimala Statistics

Ascertainment Conditional Maximum Likelihood for Continuous Outcome Under Two-Phase Response-Selective Design

Stat Med. 2025 Jul;44(15-17):e70111. doi: 10.1002/sim.70111.

ABSTRACT

Data collection procedures are often time-consuming and expensive. An alternative to collecting full information from all subjects enrolled in a study is a two-phase design: Variables that are inexpensive or easy to measure are obtained for the study population, and more specific, expensive, or hard-to-measure variables are collected only for a well-selected sample of individuals. Often, only these subjects that provided full information are used for inference, while those that were partially observed are discarded from the analysis. Recently, semiparametric approaches that use the entire dataset, resulting in fully efficient estimators, have been proposed. These estimators, however, have challenges incorporating multiple covariates, are computationally expensive, and depend on tuning parameters that affect their performance. In this paper, we propose an alternative semiparametric estimator that does not pose any distributional assumptions on the covariates or measurement error mechanism and can be applied to a wider range of settings. Although the proposed estimator is not semiparametric efficient, simulations show that the loss of efficiency to estimate the parameters associated with the partially observed covariates is minimal. We highlight the estimator’s applicability to real-world problems, where data structures are complex and rich, and complicated regression models are often necessary.

PMID:40658389 | DOI:10.1002/sim.70111

Categories
Nevin Manimala Statistics

Application of Perioperative Real-Time Fluorescence Imaging to Achieve High-Quality Debridement: A Randomized Control Trial

Adv Wound Care (New Rochelle). 2025 Jul 14. doi: 10.1177/21621918251359558. Online ahead of print.

ABSTRACT

Objective: To investigate the effectiveness of real-time fluorescence imaging (RTFI)-assisted debridement in managing chronic wounds compared with standard surgical debridement. Approach: This study was a patient-blinded, randomized clinical trial conducted from February 17, 2021, to July 30, 2021, on patients with chronic wounds. Patients were randomized to an RTFI group (M group) or conventional group (C group). The primary outcomes were as follows: percentage of residual bacterial area (preoperative and postoperative), number of debridements, high-quality debridement ratio, operation duration, and wound healing duration. Results: A total of 100 patients were enrolled in both groups. No significant difference in the percentage of preoperative residual bacterial area or high-quality debridement ratio was seen. The M group underwent debridement an average of 2.6 times and had a significantly longer duration of operation (33.5 ± 12.7 min) than the C group (29.9 ± 10.4 min; p = 0.031). The postoperative residual bacterial area was significantly lower in the M than in the C group (6.83% ± 1.39% vs. 30.0% ± 12.37%, respectively; p < 0.001). The M group required significantly fewer wound healing days (49.2 ± 25.3 vs. 63.0 ± 27.9, p < 0.001). Secondary outcomes also demonstrated statistically significant differences in total hospitalized days (17.5 ± 9.3 vs. 21.5 ± 12.5, p < 0.01), days of antibiotic use (15.5 ± 8.7 vs. 18.7 ± 6.7, p < 0.01), and reinfection rates (4 of 100 vs. 22 of 100, p < 0.001). Innovation: RTFI can detect signals from normal skin components and bacterial metabolites. Therefore, interpretation of RTFI results should be correlated with the clinical condition. RTFI is associated with high-quality debridement. This technique can also be applied in targeted biopsy and in training young staff to mature debridement procedures. Conclusion: RTFI in debridement is associated with favorable clinical outcomes and may have a positive influence on chronic wound healing.

PMID:40658376 | DOI:10.1177/21621918251359558

Categories
Nevin Manimala Statistics

Outlier Detection in Mendelian Randomization

Stat Med. 2025 Jul;44(15-17):e70143. doi: 10.1002/sim.70143.

ABSTRACT

Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal effects of exposures on an outcome. One key assumption of MR is that the genetic variants used as instrumental variables are independent of the outcome conditional on the risk factor and unobserved confounders. Violations of this assumption, that is, the effect of the instrumental variables on the outcome through a path other than the risk factor included in the model (which can be caused by pleiotropy), are common phenomena in human genetics. Genetic variants, which deviate from this assumption, appear as outliers to the MR model fit and can be detected by the general heterogeneity statistics proposed in the literature, which are known to suffer from overdispersion, that is, too many genetic variants are declared as false outliers. We propose a method that corrects for overdispersion of the heterogeneity statistics in uni- and multivariable MR analysis by making use of the estimated inflation factor to correctly remove outlying instruments and therefore account for pleiotropic effects. Our method is applicable to summary-level data.

PMID:40658369 | DOI:10.1002/sim.70143

Categories
Nevin Manimala Statistics

Investigation of major genes affecting body weight in hair goats using bayesian segregation analysis

Trop Anim Health Prod. 2025 Jul 14;57(6):300. doi: 10.1007/s11250-025-04565-7.

ABSTRACT

OBJECTIVE: The objective of this study is to investigate the presence of major genes affecting body weight in hair goats. The application of Bayesian segregation analysis to big data facilitates more precise identification of intricate genetic structures and variations. This approach offers more profound biological insights through the detection of concealed genetic elements within big datasets. The precise quantification of additive genetic effects is fundamental for achieving sustainable genetic progress through targeted selection. Furthermore, the evaluation of dominance effects offers critical insights into heterozygote advantage, elucidating the mechanisms underlying heterosis and resilience in growth-related traits within livestock populations.

METHODS: To rapidly and accurately identify the presence of major genes, pedigree data and phenotypic data were employed in a Bayesian segregation analysis. For this purpose, 4072 records of body weight were analysed, measured at two different time points (birth weight (Time1) and body weight measured at approximately 100-120 days of age (Time2)). The data set comprised 2036 animals (n = 1038 male, n = 998 female). Gibbs sampling was employed to make statistical inferences regarding posterior distributions. These inferences were based on 20 replications of the Markov chain for each trait, with 100,000 samples collected, with each 500th sample retained due to the high correlation among the samples.

RESULTS: In this study, the estimated error variance, major gene variance, polygenic variance, dominance effect, and additive genetic effect were determined through Bayesian segregation analysis. The dominance effect (-1.797) was found to be smaller than the additive genetic effect (3.594) for birth weight, whereas for body weight at 4 months of age, the dominance effect (55.902) was found to be higher than the additive genetic variance (54.988). The polygenic and major gene heritabilities were estimated to be 0.51 (± 0.56) and 0.81 (± 0.91) for body weight, and 0.44 (± 0.55) and 0.86 (± 0.93) for body weight at four months of age, respectively.

CONCLUSION: The results of this study indicate that the 95% highest posterior density regions (HPDs) for the major gene parameter, particularly for the major gene variance, do not include 0, indicating the statistical significance of the major gene component.

PMID:40658343 | DOI:10.1007/s11250-025-04565-7

Categories
Nevin Manimala Statistics

Cost-Effectiveness of Adding Dapagliflozin and Empagliflozin to Standard Treatment for Diabetic Kidney Disease in China

Clin Drug Investig. 2025 Jul 14. doi: 10.1007/s40261-025-01462-7. Online ahead of print.

ABSTRACT

BACKGROUND: Dapagliflozin and empagliflozin are emerging as promising treatment options for diabetic kidney disease (DKD).

OBJECTIVE: This study sought to evaluate the cost effectiveness of incorporating dapagliflozin and empagliflozin into the standard treatment for DKD in China.

METHODS: A Markov model was constructed to evaluate the cost-effectiveness of dapagliflozin and empagliflozin plus standard treatment versus standard treatment alone for DKD treatment from a healthcare perspective. Costs and utility data was obtained from published literatures within the Chinese context. The primary outcome included total cost, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratio (ICER). GDP per capita of 2023 in China (¥89,358) was utilized as the willingness-to-pay threshold.

RESULTS: Compared to standard treatment alone, add-on therapy of dapagliflozin or empagliflozin resulted in a higher total cost than those solely receiving standard treatment (+¥19,203.56 and +¥9496.92, respectively). However, both dapagliflozin and empagliflozin also yielded more life-years (+1.72 vs. +1.40) and QALYs (+1.40 vs. +0.88). The ICER per life-year and ICER per QALY was ¥11,178.52 and ¥18,192.50 for dapagliflozin and ¥6773.10 and ¥10,811.64 for empagliflozin, respectively. The incremental net monetary benefit was ¥75,120.54 and ¥68,994.90 for dapagliflozin and empagliflozin, respectively. Sensitivity analysis supported the main findings of the base-case analysis as the cost-effectiveness of dapagliflozin or empagliflozin was sustained for most plausible ranges of parameter values.

CONCLUSIONS: Considering that the ICER falls below the predefined willingness-to-pay threshold, incorporating dapagliflozin and empagliflozin into standard treatment for DKD is likely to be a cost-effective strategy in China.

PMID:40658333 | DOI:10.1007/s40261-025-01462-7