Categories
Nevin Manimala Statistics

VeRUS: verification of reference intervals based on the uncertainty of sampling

Clin Chem Lab Med. 2025 Nov 11. doi: 10.1515/cclm-2025-0728. Online ahead of print.

ABSTRACT

OBJECTIVES: Laboratories are required to routinely verify reported reference intervals (RIs), but common verification methods like the CLSI-EP28-A3c binomial test are often impractical due to sample collection requirements. Indirect verification methods like equivalence limits (ELs) use routine data from patient care but lack systematic evaluation. This study aimed to develop and evaluate a novel indirect verification method: verification of reference intervals based on the uncertainty of sampling (VeRUS).

METHODS: VeRUS compares the to-be-verified candidate RI to an RI estimated from local routine data. Acceptable differences are based on the sampling uncertainty intrinsic to the nonparametric method for establishing RIs with n=120 samples. The three verification methods were systematically compared with simulated test sets resembling 10 differently distributed biomarkers and a wide range of plausible candidate RIs.

RESULTS: The binomial test is inherently unable to reject too wide RIs; e.g. the 99.8 %-interval, for which ELs and VeRUS showed high rejection rates (mean 89.2 %, SD 31.5 % and mean 95.8 %, SD 2.3 %, respectively). Moreover, the binomial test incorrectly accepts 29.3 % of “too narrow” 80%-intervals, whereas the false acceptance rates of ELs and VeRUS were lower (mean 21.7 %, SD 40.9 % and mean 7.2 %, SD 4.7 %, respectively). Overall, both indirect verification methods demonstrated increased statistical power, while ELs were least consistent among different biomarker distributions.

CONCLUSIONS: Its robust performance without the need for sample collection makes VeRUS an attractive tool for RI verification. By enabling routine verification of previously practically unverifiable RIs (e.g., in pediatrics), VeRUS may enhance clinical decision-making and improve patient care.

PMID:41213183 | DOI:10.1515/cclm-2025-0728

Categories
Nevin Manimala Statistics

Modeling Alzheimer’s Disease Biomarkers’ Trajectory in the Absence of a Gold Standard Using a Bayesian Approach

Stat Med. 2025 Nov;44(25-27):e70283. doi: 10.1002/sim.70283.

ABSTRACT

To advance our understanding of Alzheimer’s Disease (AD), especially during the preclinical stage when patients’ brain functions are mostly intact, recent research has shifted towards studying AD biomarkers across the disease continuum. A widely adopted framework in AD research, proposed by Jack and colleagues, maps the progression of these biomarkers from the preclinical stage to symptomatic stages, linking their changes to the underlying pathophysiological processes of the disease. However, most existing studies rely on clinical diagnoses as a proxy for underlying AD status, potentially overlooking early stages of disease progression where biomarker changes occur before clinical symptoms appear. In this work, we develop a novel Bayesian approach to directly model the underlying AD status as a latent disease process and biomarker trajectories as nonlinear functions of disease progression. This allows for more data-driven exploration of AD progression, reducing potential biases due to inaccurate clinical diagnoses. We address the considerable heterogeneity among individuals’ biomarker measurements by introducing a subject-specific latent disease trajectory as well as incorporating random intercepts to further capture additional inter-subject differences in biomarker measurements. We evaluate our model’s performance through simulation studies. Applications to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study yield interpretable clinical insights, illustrating the potential of our approach in facilitating the understanding of AD biomarker evolution.

PMID:41213170 | DOI:10.1002/sim.70283

Categories
Nevin Manimala Statistics

On “Confirmatory” Methodological Research in Statistics and Related Fields

Stat Med. 2025 Nov;44(25-27):e70303. doi: 10.1002/sim.70303.

ABSTRACT

Empirical substantive research, such as in the life or social sciences, is commonly categorized into the two modes exploratory and confirmatory, both of which are essential to scientific progress. The former is also referred to as hypothesis-generating or data-contingent research, while the latter is also called hypothesis-testing research. In the context of empirical methodological research in statistics, however, the exploratory-confirmatory distinction has received very little attention so far. Our paper aims to fill this gap. First, we revisit the concept of empirical methodological research through the lens of the exploratory-confirmatory distinction. Second, we examine current practice with respect to this distinction through a literature survey including 115 articles from the field of biostatistics. Third, we provide practical recommendations toward a more appropriate design, interpretation, and reporting of empirical methodological research in light of this distinction. In particular, we argue that both modes of research are crucial to methodological progress, but that most published studies-even if sometimes disguised as confirmatory-are essentially exploratory in nature. We emphasize that it may be adequate to consider empirical methodological research as a continuum between “pure” exploration and “strict” confirmation, recommend transparently reporting the mode of conducted research within the spectrum between exploratory and confirmatory, and stress the importance of study protocols written before conducting the study, especially in confirmatory methodological research.

PMID:41213159 | DOI:10.1002/sim.70303

Categories
Nevin Manimala Statistics

Evaluation of the suitability of COSHH Essentials for qualitative assessment of inhalation risk from chemical agents in perfume laboratories: a new perspective

Int J Occup Saf Ergon. 2025 Nov 10:1-10. doi: 10.1080/10803548.2025.2575603. Online ahead of print.

ABSTRACT

Objectives. Professionals in the perfume industry are routinely exposed to numerous chemical substances during olfactory evaluations, some of which may pose inhalation hazards. Existing qualitative risk assessment tools, such as Control of Substances Hazardous to Health (COSHH) Essentials, provide approximate estimates and may not be well suited to industries with highly specific exposure conditions like perfumery. This study evaluates the applicability and limitations of COSHH Essentials in perfume laboratories and proposes an improved qualitative framework tailored to perfumers’ exposure scenarios. Methods. A total of 626 substances from a perfumer’s palette were assessed using COSHH Essentials, which classifies substances into risk levels based on hazard, volatility and quantity. A complementary method incorporating molecular-level hazard analysis, exposure patterns, occupational exposure limits and conservative inhalation dose estimations was developed. Statistical agreement between both methods was examined using Cohen’s κ, and McNemar’s test assessed significant differences. Results. COSHH Essentials identified 76 hazardous substances, while the enhanced method identified 81 substances, including five additional requiring local exhaust ventilation. Agreement was moderate (κ = 0.58; p = 0.031). Conclusion. COSHH Essentials provides a useful baseline but lacks the specificity needed in industries with intentional close-range exposure. The enhanced method enables more precise, context-sensitive assessment and better protection in fragrance laboratories.

PMID:41213145 | DOI:10.1080/10803548.2025.2575603

Categories
Nevin Manimala Statistics

Challenges and Solutions in Applying Large Language Models to Guideline-Based Management Planning and Automated Medical Coding in Health Care: Algorithm Development and Validation

JMIR Biomed Eng. 2025 Nov 10;10:e66691. doi: 10.2196/66691.

ABSTRACT

BACKGROUND: Diagnostic errors and administrative burdens, including medical coding, remain major challenges in health care. Large language models (LLMs) have the potential to alleviate these problems, but their adoption has been limited by concerns regarding reliability, transparency, and clinical safety.

OBJECTIVE: This study introduces and evaluates 2 LLM-based frameworks, implemented within the Rhazes Clinician platform, designed to address these challenges: generation-assisted retrieval-augmented generation (GARAG) for automated evidence-based treatment planning and generation-assisted vector search (GAVS) for automated medical coding.

METHODS: GARAG was evaluated on 21 clinical test cases created by medically qualified authors. Each case was executed 3 times independently, and outputs were assessed using 4 criteria: correctness of references, absence of duplication, adherence to formatting, and clinical appropriateness of the generated management plan. GAVS was evaluated on 958 randomly selected admissions from the Medical Information Mart for Intensive Care (MIMIC)-IV database, in which billed International Classification of Diseases, Tenth Revision (ICD-10) codes served as the ground truth. Two approaches were compared: a direct GPT-4.1 baseline prompted to predict ICD-10 codes without constraints and GAVS, in which GPT-4.1 generated diagnostic entities that were each mapped onto the top 10 matching ICD-10 codes through vector search.

RESULTS: Across the 63 outputs, 62 (98.4%) satisfied all evaluation criteria, with the only exception being a minor ordering inconsistency in one repetition of case 14. For GAVS, the 958 admissions contained 8576 assigned ICD-10 subcategory codes (1610 unique). The vanilla LLM produced 131,329 candidate codes, whereas GAVS produced 136,920. At the subcategory level, the vanilla LLM achieved 17.95% average recall (15.86% weighted), while GAVS achieved 20.63% (18.62% weighted), a statistically significant improvement (P<.001). At the category level, performance converged (32.60% vs 32.58% average weighted recall; P=.99).

CONCLUSIONS: GARAG demonstrated a workflow that grounds management plans in diagnosis-specific, peer-reviewed guideline evidence, preserving fine-grained clinical detail during retrieval. GAVS significantly improved fine-grained diagnostic coding recall compared with a direct LLM baseline. Together, these frameworks illustrate how LLM-based methods can enhance clinical decision support and medical coding. Both were subsequently integrated into Rhazes Clinician, a clinician-facing web application that orchestrates LLM agents to call specialized tools, providing a single interface for physician use. Further independent validation and large-scale studies are required to confirm generalizability and assess their impact on patient outcomes.

PMID:41213118 | DOI:10.2196/66691

Categories
Nevin Manimala Statistics

Analyzing Sleep Behavior Using BERT-BiLSTM and Fine-Tuned GPT-2 Sentiment Classification: Comparison Study

JMIR Med Inform. 2025 Nov 10;13:e70753. doi: 10.2196/70753.

ABSTRACT

BACKGROUND: The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.

OBJECTIVE: Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.

METHODS: To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM-based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).

RESULTS: In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness-where standardized measures may not fully reflect the patient’s lived experience.

CONCLUSIONS: Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.

PMID:41213114 | DOI:10.2196/70753

Categories
Nevin Manimala Statistics

Cluster-Based Predictive Modeling of User Ratings for Physical Activity Apps Using Mobile App Rating Scale (MARS) Dimensions: Model Development and Validation

JMIR Mhealth Uhealth. 2025 Nov 6;13:e70987. doi: 10.2196/70987.

ABSTRACT

BACKGROUND: The expansion of mobile health app or apps has created a growing need for structured and predictive tools to evaluate app quality before deployment. The Mobile App Rating Scale (MARS) offers a standardized, expert-driven assessment across 4 key dimensions-engagement, functionality, aesthetics, and information-but its use in forecasting user satisfaction through predictive modeling remains limited.

OBJECTIVE: This study aimed to investigate how k-means clustering, combined with machine learning models, can predict user ratings for physical activity apps based on MARS dimensions, with the goal of forecasting ratings before production and uncovering insights into user satisfaction drivers.

METHODS: We analyzed a dataset of 155 MARS-rated physical activity apps with user ratings. The dataset was split into training (n=111) and testing (n=44) subsets. K means clustering was applied to the training data, identifying 2 clusters. Exploratory data analysis included box plots, summary statistics, and component+residual plots to visualize linearity and distribution patterns across MARS dimensions. Correlation analysis was performed to quantify relationships between each MARS dimension and user ratings. In total, 5 machine learning models-generalized additive models, k-nearest neighbors, random forest, extreme gradient boosting, and support vector regression-were trained with and without clustering. Models were hypertuned and trained separately on each cluster, and the best-performing model for each cluster was selected. These predictions were combined to compute final performance metrics for the test set. Performance was evaluated using correct prediction percentage (0.5 range), mean absolute error, and R². Validation was performed on 2 additional datasets: mindfulness (n=85) and older adults (n=55) apps.

RESULTS: Exploratory data analysis revealed that apps in cluster 1 were feature-rich and scored higher across all MARS dimensions, reflecting comprehensive and engagement-oriented designs. In contrast, cluster 2 comprised simpler, utilitarian apps focused on basic functionality. Component+residual plots showed nonlinear relationships, which became more interpretable within clusters. Correlation analysis indicated stronger associations between user ratings and engagement and functionality, but weaker or negative correlations with aesthetics and information, particularly in cluster 2. In the unclustered dataset, k nearest neighbors achieved 79.55% accuracy, mean absolute error=0.26, and R²=0.06. The combined support vector regression (cluster 1)+k-nearest neighbors (cluster 2) model achieved the highest performance: 88.64% accuracy, mean absolute error=0.27, and R²=0.04. Clustering improved prediction accuracy and enhanced alignment between predicted and actual user ratings. Models also generalized well to the external datasets.

CONCLUSIONS: The combined clustering and modeling approach enhances prediction accuracy and reveals how user satisfaction drivers vary across app types. By transforming MARS from a descriptive tool into a predictive framework, this study offers a scalable, transparent method for forecasting user ratings during app development-particularly useful in early-stage or low-data settings.

PMID:41213075 | DOI:10.2196/70987

Categories
Nevin Manimala Statistics

Universal prevention programs for depression and anxiety disorders in children and adults: a systematic review and meta-analysis

Trends Psychiatry Psychother. 2025 Nov 10. doi: 10.47626/2237-6089-2025-1127. Online ahead of print.

ABSTRACT

OBJECTIVE: Cognitive-behavioral therapy (CBT) is a first-line treatment for anxiety and depressive disorders, but its preventive efficacy remains uncertain. This study systematically reviewed and meta-analyzed randomized controlled trials of universal CBT-based interventions across all age groups, evaluating their effects on anxiety, depression, and quality of life.

METHODS: We included randomized controlled trials of universal CBT programs delivered to general populations without prior risk or symptom screening. Eligible outcomes were depressive and anxiety symptoms and quality of life. Risk of bias was assessed using the Cochrane Risk of Bias tool. Separate three-level meta-analyses were conducted for each outcome, and subgroup analyses were performed by participant age and provider profession.

RESULTS: Seventeen RCTs (n = 10,809 participants) met inclusion criteria. Pooled effect sizes were SMD = -0.02 (95% CI: -0.12 to 0.09) for quality of life, SMD = -0.09 (95% CI: -0.20 to 0.01) for depressive symptoms, and SMD = -0.03 (95% CI: -0.18 to 0.13) for anxiety symptoms. None reached statistical significance. Subgroup analyses confirmed no significant effects in children/adolescents or adults. Interventions delivered by psychologists were more effective than those delivered by teachers (SMD = 0.18), although overall preventive effects remained negligible.

CONCLUSIONS: Universal CBT interventions did not demonstrate significant preventive benefits for anxiety, depression, or quality of life across age groups. These findings suggest that universal CBT should not be adopted as a population-wide prevention strategy, and future research should prioritize targeted, data-driven approaches.

PMID:41213072 | DOI:10.47626/2237-6089-2025-1127

Categories
Nevin Manimala Statistics

Impact of Posterior Single Tooth Loss on Oral Health-Related Quality of Life and Single-Unit Immediate Implant Loading: Six-Month Follow-up Study

J Long Term Eff Med Implants. 2025;35(4):69-77. doi: 10.1615/JLongTermEffMedImplants.2025056315.

ABSTRACT

The current study sought to assess the impact on quality of life in individuals with a single posterior tooth loss, as well as its improvement after a single-unit immediate implant loading in a 6-month follow-up. Forty patients with a single posterior tooth loss were evaluated for oral health-related quality of life using the OHIP-14 before and six months after rehabilitation, as well as anxiety and depression symptoms using the Hospital Anxiety and Depression Scale (HADS). The Two-Way ANOVA was used to compare OHIP-14 scores at the tooth loss site and after rehabilitation. Descriptive statistics were computed for the HADS and OHIP-14 domains. Thirty-three patients that remained in the sample presented significantly reduced OHIP-14 scores after 6 months of rehabilitation, regardless the tooth loss site. Physical pain, psychological discomfort, psychological disability, and physical disability were the four most affected OHIP-14 domains and presented improvement after the rehabilitation. The individuals presented HADS scores for anxiety and depressive symptoms majorly within the normal range. Posterior single tooth loss has a detrimental impact on oral health-related quality of life. Either function or psychological domains appear to be impacted. The patients’ rehabilitation was found to have a significant impact on quality-of-life improvement.

PMID:41213053 | DOI:10.1615/JLongTermEffMedImplants.2025056315

Categories
Nevin Manimala Statistics

Effect of Proximal Contact Type and Implant Site on Peri-Implant Papillary Architecture in the Maxillary Anterior Region: A Cross-Sectional Clinical Study

J Long Term Eff Med Implants. 2025;35(4):63-68. doi: 10.1615/JLongTermEffMedImplants.2025059317.

ABSTRACT

The interdental papilla plays a critical role in both the function and esthetics of the anterior maxilla. In implant dentistry, maintaining this delicate soft tissue is often challenging, particularly when implants are adjacent to each other, due to anatomical and vascular limitations. To evaluate the mean height of the peri-implant papilla in the maxillary anterior region and investigate its association with the type of adjacent structure (natural tooth or implant) and implant location (central incisor, lateral incisor, canine). This cross-sectional study included 298 patients with 342 implants and 684 contact areas in the anterior maxilla, with at least 1 year of functional loading. Sites were categorized into implant-tooth (n = 401) and implant-implant (n = 283) groups. Papilla height was measured and statistical analysis was performed using independent t-tests and ANOVA. Mean papilla height was significantly greater in implant-tooth contacts than implant-implant contacts (P < 0.05). At central incisors, papilla height was 3.4 ± 0.4 mm (implant-tooth) vs. 3.0 ± 0.3 mm (implant-implant); at lateral incisors, 2.8 ± 0.3 mm vs. 2.6 ± 0.1 mm; and at canines, 3.0 ± 0.4 mm vs. 2.8 ± 0.2 mm. Central incisors consistently showed the highest papilla values. Peri-implant papilla height is significantly greater at implant-tooth contact sites compared with implant-implant sites in the anterior maxilla. Among all locations, central incisor regions demonstrated the highest papilla levels.

PMID:41213052 | DOI:10.1615/JLongTermEffMedImplants.2025059317