Categories
Nevin Manimala Statistics

On “Confirmatory” Methodological Research in Statistics and Related Fields

Stat Med. 2025 Nov;44(25-27):e70303. doi: 10.1002/sim.70303.

ABSTRACT

Empirical substantive research, such as in the life or social sciences, is commonly categorized into the two modes exploratory and confirmatory, both of which are essential to scientific progress. The former is also referred to as hypothesis-generating or data-contingent research, while the latter is also called hypothesis-testing research. In the context of empirical methodological research in statistics, however, the exploratory-confirmatory distinction has received very little attention so far. Our paper aims to fill this gap. First, we revisit the concept of empirical methodological research through the lens of the exploratory-confirmatory distinction. Second, we examine current practice with respect to this distinction through a literature survey including 115 articles from the field of biostatistics. Third, we provide practical recommendations toward a more appropriate design, interpretation, and reporting of empirical methodological research in light of this distinction. In particular, we argue that both modes of research are crucial to methodological progress, but that most published studies-even if sometimes disguised as confirmatory-are essentially exploratory in nature. We emphasize that it may be adequate to consider empirical methodological research as a continuum between “pure” exploration and “strict” confirmation, recommend transparently reporting the mode of conducted research within the spectrum between exploratory and confirmatory, and stress the importance of study protocols written before conducting the study, especially in confirmatory methodological research.

PMID:41213159 | DOI:10.1002/sim.70303

Categories
Nevin Manimala Statistics

Evaluation of the suitability of COSHH Essentials for qualitative assessment of inhalation risk from chemical agents in perfume laboratories: a new perspective

Int J Occup Saf Ergon. 2025 Nov 10:1-10. doi: 10.1080/10803548.2025.2575603. Online ahead of print.

ABSTRACT

Objectives. Professionals in the perfume industry are routinely exposed to numerous chemical substances during olfactory evaluations, some of which may pose inhalation hazards. Existing qualitative risk assessment tools, such as Control of Substances Hazardous to Health (COSHH) Essentials, provide approximate estimates and may not be well suited to industries with highly specific exposure conditions like perfumery. This study evaluates the applicability and limitations of COSHH Essentials in perfume laboratories and proposes an improved qualitative framework tailored to perfumers’ exposure scenarios. Methods. A total of 626 substances from a perfumer’s palette were assessed using COSHH Essentials, which classifies substances into risk levels based on hazard, volatility and quantity. A complementary method incorporating molecular-level hazard analysis, exposure patterns, occupational exposure limits and conservative inhalation dose estimations was developed. Statistical agreement between both methods was examined using Cohen’s κ, and McNemar’s test assessed significant differences. Results. COSHH Essentials identified 76 hazardous substances, while the enhanced method identified 81 substances, including five additional requiring local exhaust ventilation. Agreement was moderate (κ = 0.58; p = 0.031). Conclusion. COSHH Essentials provides a useful baseline but lacks the specificity needed in industries with intentional close-range exposure. The enhanced method enables more precise, context-sensitive assessment and better protection in fragrance laboratories.

PMID:41213145 | DOI:10.1080/10803548.2025.2575603

Categories
Nevin Manimala Statistics

Challenges and Solutions in Applying Large Language Models to Guideline-Based Management Planning and Automated Medical Coding in Health Care: Algorithm Development and Validation

JMIR Biomed Eng. 2025 Nov 10;10:e66691. doi: 10.2196/66691.

ABSTRACT

BACKGROUND: Diagnostic errors and administrative burdens, including medical coding, remain major challenges in health care. Large language models (LLMs) have the potential to alleviate these problems, but their adoption has been limited by concerns regarding reliability, transparency, and clinical safety.

OBJECTIVE: This study introduces and evaluates 2 LLM-based frameworks, implemented within the Rhazes Clinician platform, designed to address these challenges: generation-assisted retrieval-augmented generation (GARAG) for automated evidence-based treatment planning and generation-assisted vector search (GAVS) for automated medical coding.

METHODS: GARAG was evaluated on 21 clinical test cases created by medically qualified authors. Each case was executed 3 times independently, and outputs were assessed using 4 criteria: correctness of references, absence of duplication, adherence to formatting, and clinical appropriateness of the generated management plan. GAVS was evaluated on 958 randomly selected admissions from the Medical Information Mart for Intensive Care (MIMIC)-IV database, in which billed International Classification of Diseases, Tenth Revision (ICD-10) codes served as the ground truth. Two approaches were compared: a direct GPT-4.1 baseline prompted to predict ICD-10 codes without constraints and GAVS, in which GPT-4.1 generated diagnostic entities that were each mapped onto the top 10 matching ICD-10 codes through vector search.

RESULTS: Across the 63 outputs, 62 (98.4%) satisfied all evaluation criteria, with the only exception being a minor ordering inconsistency in one repetition of case 14. For GAVS, the 958 admissions contained 8576 assigned ICD-10 subcategory codes (1610 unique). The vanilla LLM produced 131,329 candidate codes, whereas GAVS produced 136,920. At the subcategory level, the vanilla LLM achieved 17.95% average recall (15.86% weighted), while GAVS achieved 20.63% (18.62% weighted), a statistically significant improvement (P<.001). At the category level, performance converged (32.60% vs 32.58% average weighted recall; P=.99).

CONCLUSIONS: GARAG demonstrated a workflow that grounds management plans in diagnosis-specific, peer-reviewed guideline evidence, preserving fine-grained clinical detail during retrieval. GAVS significantly improved fine-grained diagnostic coding recall compared with a direct LLM baseline. Together, these frameworks illustrate how LLM-based methods can enhance clinical decision support and medical coding. Both were subsequently integrated into Rhazes Clinician, a clinician-facing web application that orchestrates LLM agents to call specialized tools, providing a single interface for physician use. Further independent validation and large-scale studies are required to confirm generalizability and assess their impact on patient outcomes.

PMID:41213118 | DOI:10.2196/66691

Categories
Nevin Manimala Statistics

Analyzing Sleep Behavior Using BERT-BiLSTM and Fine-Tuned GPT-2 Sentiment Classification: Comparison Study

JMIR Med Inform. 2025 Nov 10;13:e70753. doi: 10.2196/70753.

ABSTRACT

BACKGROUND: The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.

OBJECTIVE: Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.

METHODS: To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM-based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).

RESULTS: In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness-where standardized measures may not fully reflect the patient’s lived experience.

CONCLUSIONS: Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.

PMID:41213114 | DOI:10.2196/70753

Categories
Nevin Manimala Statistics

Cluster-Based Predictive Modeling of User Ratings for Physical Activity Apps Using Mobile App Rating Scale (MARS) Dimensions: Model Development and Validation

JMIR Mhealth Uhealth. 2025 Nov 6;13:e70987. doi: 10.2196/70987.

ABSTRACT

BACKGROUND: The expansion of mobile health app or apps has created a growing need for structured and predictive tools to evaluate app quality before deployment. The Mobile App Rating Scale (MARS) offers a standardized, expert-driven assessment across 4 key dimensions-engagement, functionality, aesthetics, and information-but its use in forecasting user satisfaction through predictive modeling remains limited.

OBJECTIVE: This study aimed to investigate how k-means clustering, combined with machine learning models, can predict user ratings for physical activity apps based on MARS dimensions, with the goal of forecasting ratings before production and uncovering insights into user satisfaction drivers.

METHODS: We analyzed a dataset of 155 MARS-rated physical activity apps with user ratings. The dataset was split into training (n=111) and testing (n=44) subsets. K means clustering was applied to the training data, identifying 2 clusters. Exploratory data analysis included box plots, summary statistics, and component+residual plots to visualize linearity and distribution patterns across MARS dimensions. Correlation analysis was performed to quantify relationships between each MARS dimension and user ratings. In total, 5 machine learning models-generalized additive models, k-nearest neighbors, random forest, extreme gradient boosting, and support vector regression-were trained with and without clustering. Models were hypertuned and trained separately on each cluster, and the best-performing model for each cluster was selected. These predictions were combined to compute final performance metrics for the test set. Performance was evaluated using correct prediction percentage (0.5 range), mean absolute error, and R². Validation was performed on 2 additional datasets: mindfulness (n=85) and older adults (n=55) apps.

RESULTS: Exploratory data analysis revealed that apps in cluster 1 were feature-rich and scored higher across all MARS dimensions, reflecting comprehensive and engagement-oriented designs. In contrast, cluster 2 comprised simpler, utilitarian apps focused on basic functionality. Component+residual plots showed nonlinear relationships, which became more interpretable within clusters. Correlation analysis indicated stronger associations between user ratings and engagement and functionality, but weaker or negative correlations with aesthetics and information, particularly in cluster 2. In the unclustered dataset, k nearest neighbors achieved 79.55% accuracy, mean absolute error=0.26, and R²=0.06. The combined support vector regression (cluster 1)+k-nearest neighbors (cluster 2) model achieved the highest performance: 88.64% accuracy, mean absolute error=0.27, and R²=0.04. Clustering improved prediction accuracy and enhanced alignment between predicted and actual user ratings. Models also generalized well to the external datasets.

CONCLUSIONS: The combined clustering and modeling approach enhances prediction accuracy and reveals how user satisfaction drivers vary across app types. By transforming MARS from a descriptive tool into a predictive framework, this study offers a scalable, transparent method for forecasting user ratings during app development-particularly useful in early-stage or low-data settings.

PMID:41213075 | DOI:10.2196/70987

Categories
Nevin Manimala Statistics

Universal prevention programs for depression and anxiety disorders in children and adults: a systematic review and meta-analysis

Trends Psychiatry Psychother. 2025 Nov 10. doi: 10.47626/2237-6089-2025-1127. Online ahead of print.

ABSTRACT

OBJECTIVE: Cognitive-behavioral therapy (CBT) is a first-line treatment for anxiety and depressive disorders, but its preventive efficacy remains uncertain. This study systematically reviewed and meta-analyzed randomized controlled trials of universal CBT-based interventions across all age groups, evaluating their effects on anxiety, depression, and quality of life.

METHODS: We included randomized controlled trials of universal CBT programs delivered to general populations without prior risk or symptom screening. Eligible outcomes were depressive and anxiety symptoms and quality of life. Risk of bias was assessed using the Cochrane Risk of Bias tool. Separate three-level meta-analyses were conducted for each outcome, and subgroup analyses were performed by participant age and provider profession.

RESULTS: Seventeen RCTs (n = 10,809 participants) met inclusion criteria. Pooled effect sizes were SMD = -0.02 (95% CI: -0.12 to 0.09) for quality of life, SMD = -0.09 (95% CI: -0.20 to 0.01) for depressive symptoms, and SMD = -0.03 (95% CI: -0.18 to 0.13) for anxiety symptoms. None reached statistical significance. Subgroup analyses confirmed no significant effects in children/adolescents or adults. Interventions delivered by psychologists were more effective than those delivered by teachers (SMD = 0.18), although overall preventive effects remained negligible.

CONCLUSIONS: Universal CBT interventions did not demonstrate significant preventive benefits for anxiety, depression, or quality of life across age groups. These findings suggest that universal CBT should not be adopted as a population-wide prevention strategy, and future research should prioritize targeted, data-driven approaches.

PMID:41213072 | DOI:10.47626/2237-6089-2025-1127

Categories
Nevin Manimala Statistics

Impact of Posterior Single Tooth Loss on Oral Health-Related Quality of Life and Single-Unit Immediate Implant Loading: Six-Month Follow-up Study

J Long Term Eff Med Implants. 2025;35(4):69-77. doi: 10.1615/JLongTermEffMedImplants.2025056315.

ABSTRACT

The current study sought to assess the impact on quality of life in individuals with a single posterior tooth loss, as well as its improvement after a single-unit immediate implant loading in a 6-month follow-up. Forty patients with a single posterior tooth loss were evaluated for oral health-related quality of life using the OHIP-14 before and six months after rehabilitation, as well as anxiety and depression symptoms using the Hospital Anxiety and Depression Scale (HADS). The Two-Way ANOVA was used to compare OHIP-14 scores at the tooth loss site and after rehabilitation. Descriptive statistics were computed for the HADS and OHIP-14 domains. Thirty-three patients that remained in the sample presented significantly reduced OHIP-14 scores after 6 months of rehabilitation, regardless the tooth loss site. Physical pain, psychological discomfort, psychological disability, and physical disability were the four most affected OHIP-14 domains and presented improvement after the rehabilitation. The individuals presented HADS scores for anxiety and depressive symptoms majorly within the normal range. Posterior single tooth loss has a detrimental impact on oral health-related quality of life. Either function or psychological domains appear to be impacted. The patients’ rehabilitation was found to have a significant impact on quality-of-life improvement.

PMID:41213053 | DOI:10.1615/JLongTermEffMedImplants.2025056315

Categories
Nevin Manimala Statistics

Effect of Proximal Contact Type and Implant Site on Peri-Implant Papillary Architecture in the Maxillary Anterior Region: A Cross-Sectional Clinical Study

J Long Term Eff Med Implants. 2025;35(4):63-68. doi: 10.1615/JLongTermEffMedImplants.2025059317.

ABSTRACT

The interdental papilla plays a critical role in both the function and esthetics of the anterior maxilla. In implant dentistry, maintaining this delicate soft tissue is often challenging, particularly when implants are adjacent to each other, due to anatomical and vascular limitations. To evaluate the mean height of the peri-implant papilla in the maxillary anterior region and investigate its association with the type of adjacent structure (natural tooth or implant) and implant location (central incisor, lateral incisor, canine). This cross-sectional study included 298 patients with 342 implants and 684 contact areas in the anterior maxilla, with at least 1 year of functional loading. Sites were categorized into implant-tooth (n = 401) and implant-implant (n = 283) groups. Papilla height was measured and statistical analysis was performed using independent t-tests and ANOVA. Mean papilla height was significantly greater in implant-tooth contacts than implant-implant contacts (P < 0.05). At central incisors, papilla height was 3.4 ± 0.4 mm (implant-tooth) vs. 3.0 ± 0.3 mm (implant-implant); at lateral incisors, 2.8 ± 0.3 mm vs. 2.6 ± 0.1 mm; and at canines, 3.0 ± 0.4 mm vs. 2.8 ± 0.2 mm. Central incisors consistently showed the highest papilla values. Peri-implant papilla height is significantly greater at implant-tooth contact sites compared with implant-implant sites in the anterior maxilla. Among all locations, central incisor regions demonstrated the highest papilla levels.

PMID:41213052 | DOI:10.1615/JLongTermEffMedImplants.2025059317

Categories
Nevin Manimala Statistics

Influence of Torque Type and Bone Loss on the Stability Quotient of Two Implants with Prostheses

J Long Term Eff Med Implants. 2025;35(4):51-61. doi: 10.1615/JLongTermEffMedImplants.2025060463.

ABSTRACT

Osseointegration is related to the stability of the screw and influences the success rate of implant-supported prosthetic rehabilitation, as it promotes natural healing and effective bone formation, facilitating the preservation of the implant in the recipient site. Factors such as surgical technique, insertion torque, the type of recipient bone, and the macro- and microstructure of the implant can affect screw stability. The objective of this study is to analyze in vivo the influence of insertion torque, recipient bone type, and peri-implant bone loss on the implant stability quotient (ISQ) values of cylindrical implants with external hexagon (EH) and Morse taper (MT) connections, featuring a new surface treatment called referenced acid etching (RAE). A total of 40 implants were placed in edentulous areas following predefined inclusion and exclusion criteria. Immediately after implant placement (t0), insertion torque, resonance frequency, digital periapical radiographs, and peri-implant evaluation were recorded. Resonance frequency analysis, periapical radiographs, and peri-implant evaluations were repeated after osseointegration (t1) and 180 d after rehabilitation (t2). The data obtained were statistically analyzed using specific tests for each type of analysis, with a significance level of 5%. Regarding the ISQ value at t1, it resulted in a high value, with a significant reduction at t2, for both types of connection, there was bone resorption for the HE and bone gain for the MT. The installation of the implants up to 180 d of prosthesis functionality, stability, bone gain or loss and type of bone, presented clinically acceptable conditions for all connections studied.

PMID:41213051 | DOI:10.1615/JLongTermEffMedImplants.2025060463

Categories
Nevin Manimala Statistics

Unlocking Healing Potential: Impact of Non-Surgical Therapy on Peri-Implant Crevicular Fluid Calprotectin Levels – A Clinical Insight

J Long Term Eff Med Implants. 2025;35(4):15-20. doi: 10.1615/JLongTermEffMedImplants.2025057382.

ABSTRACT

Biomarkers within peri-implant crevicular fluid (PICF) are emerging as pivotal diagnostic agents, promising heightened accuracy in identifying peri-implant diseases. The present study aimed to assess the impact of non-sur-gical therapy on PICF calprotectin levels in patients with peri-implantitis. A total of 40 individuals aged between 30 and 60 years were enrolled: Group I (n = 20 healthy peri-implant sites) and Group IIa (n = 20 peri-implantitis sites). Clinical parameters such as peri-implant probing depth (PPD) and clinical attachment level (CAL) were recorded. PICF samples were collected and assayed for calprotectin using enzyme-linked immunosorbent assay (ELISA). After clinical examination and PICF collection at baseline, mechanical debridement was done for peri-implantitis patients, and after 3 months, clinical examination and PICF collection was done (Group IIb). The results were statistically analyzed. The PICF calprotectin level was higher in Group IIa (43.76 ± 3.64 ng/mL) as compared with Group I (11.36 ±2.53 ng/mL). Between Groups IIa and IIb, there was a significant reduction in PPD, CAL, and calprotectin from baseline to 3 months (P < 0.05). Pearson correlation in Groups IIa and IIb revealed that the correlation between calprotectin and clinical parameters was strongly positive and statistically significant. The present study suggests that there was a significant reduction in PICF calprotectin levels among peri-implantitis patients after mechanical debridement. Also, there exists a positive correlation between PICF calprotectin and peri-implant health parameters.

PMID:41213047 | DOI:10.1615/JLongTermEffMedImplants.2025057382