Categories
Nevin Manimala Statistics

Evaluation of reliability, repeatability, and confidence of ChatGPT for screening, monitoring, and treatment of interstitial lung disease in patients with systemic autoimmune rheumatic diseases

Digit Health. 2025 Sep 29;11:20552076251384233. doi: 10.1177/20552076251384233. eCollection 2025 Jan-Dec.

ABSTRACT

BACKGROUND: In recent years, potential applications of ChatGPT in medication-related practices have drawn great attention for its intuitive user interfaces, chatbot, and powerful analytical capabilities. However, whether ChatGPT can be broadly applied in clinical practice remains controversial. Early screening, monitoring, and timely treatment are crucial for improving outcomes of interstitial lung disease (ILD) in systemic autoimmune rheumatic diseases (SARDs) due to its high morbidity and mortality rate. This study aimed to evaluate the reliability, repeatability, and confidence of ChatGPT models (GPT-4, GPT-4o mini, and GPT-4o) in delivering guideline-based recommendations for the screening, monitoring, and treatment of ILD in SARD patients.

METHODS: Questions derived from the ACR/CHEST guideline for ILD patients with SARDs were used to benchmark three versions of ChatGPT (GPT-4, GPT-4o mini, and GPT-4o) across three separate attempts. The responses were recorded, and the reliability, repeatability, and confidence were analyzed with the recommendations from the guideline.

RESULTS: GPT-4 demonstrated significant variability in reliability across the three attempts (P = .007). In contrast, the other versions showed no significant differences. GPT-4 and GPT-4o mini exhibited substantial interrater agreement (Kendall’s W = 0.747 and 0.765, respectively), whereas GPT-4o demonstrated almost perfect interrater agreement (Kendall’s W = 0.816). All three versions showed statistically significant differences in high confidence ratings (confidence score of ≥ 8 on the 1-10 scale) across the three attempts (P < .01). Given the higher consistency of GPT-4o and GPT-4o mini, a further comparison was conducted between them on the third attempt. No significant difference was observed in accuracy percentages across the third attempt between GPT-4o and GPT-4o mini (P = .597). Similarly, interrater agreement across the three attempts was not significantly different for both GPT-4o and GPT-4o mini (P = .152). Furthermore, the overconfidence percentage (confidence score of ≥8 assigned to incorrect answers) was 100% (22 of 22) for GPT-4o and 22.7% (10 of 44) for GPT-4o mini, respectively (P < .01).

CONCLUSIONS: GPT-4o mini and GPT-4o demonstrated stable reliability across all three attempts, whereas GPT-4 did not. The repeatability of GPT-4o tended to perform better than GPT-4o mini, although this difference was not statistically significant. Additionally, GPT-4o exhibited a higher tendency toward overconfidence compared to GPT-4o mini. Overall, the GPT-4o models performed most effectively in managing SARD-ILD but may exhibit overconfidence in certain scenarios.

PMID:41036434 | PMC:PMC12480815 | DOI:10.1177/20552076251384233

Categories
Nevin Manimala Statistics

The effects of mobile health on self-management of patients with diabetes: A systematic review

Digit Health. 2025 Sep 29;11:20552076251382049. doi: 10.1177/20552076251382049. eCollection 2025 Jan-Dec.

ABSTRACT

BACKGROUND: Diabetes is a chronic disease affecting many people globally and is a significant health concern. Health services are focusing on managing the rising incidence of diabetes and its complications. A novel mobile health (mHealth) intervention intends to assist diabetics in managing the levels of their blood sugar and improving self-care. The study aimed to evaluate the effect of short message service (SMS) and mobile app interventions on patients’ capacity to improve hemoglobin A1c (HbA1c) levels.

METHOD: A targeted search was performed in PubMed, MEDLINE, EMBASE, the Cochrane Library, ScienceDirect, and Scopus for randomized controlled trials (RCTs) published from 2014 to 2024 that evaluated the effects of mobile apps and SMS-based self-care interventions on individuals with poorly managed diabetes.

RESULT: A rigorous review identified nineteen studies for analysis. Fifteen of these studies showed a statistically significant reduction in HbA1c levels in the intervention group, compared to baseline measurements. In contrast, control groups did not exhibit the same level of reduction, resulting in a significant difference between intervention and control groups over time. This suggests that the interventions were effective in lowering HbA1c levels.

CONCLUSION: Improving glycemic levels in inadequately managed diabetes is crucial. Better blood sugar management enhances patient well-being and lowers healthcare costs, making targeted interventions essential for improved outcomes and healthcare efficiency. When developing support services and educational programs for diabetes self-management, considering value-based care and public health models, it is important for organizations, diabetes educators, legislators, and funders to carefully consider these solutions.

PMID:41036432 | PMC:PMC12480833 | DOI:10.1177/20552076251382049

Categories
Nevin Manimala Statistics

Vital Signs-Only Machine Learning Model for Acute Inpatient Deterioration: A Retrospective Multicenter Study

Mayo Clin Proc Innov Qual Outcomes. 2025 Sep 19;9(5):100663. doi: 10.1016/j.mayocpiqo.2025.100663. eCollection 2025 Oct.

ABSTRACT

OBJECTIVE: To develop predictive models that are compatible with vital signs monitoring devices to identify patients at risk of clinical deterioration, defined as requiring a rapid response team intervention or an unplanned intensive care unit transfer.

PATIENTS AND METHODS: Targeted vital signs from 227,858 inpatients admitted to general care or telemetry beds at a multihospital health care institution between January 1, 2019, and July 31, 2023, were selected. After filtering for high-quality data, 30,118 patients were used to train a Light Gradient Boosting Machine, and 30,095 were reserved for blind validation. We developed a machine learning model designed to minimize false positives while maintaining clinical relevance in identifying low-prevalence clinical deterioration events.

RESULTS: At a sensitivity of 73.4% (95% CI, 72.2%-74.4%), the model achieved a positive predictive value (PPV) of 30.4% (95% CI, 29.6%-31.3%), with a C-statistic of 0.874 (95% CI, 0.867-0.881), alert rate of 0.170 (95% CI, 0.167-0.173) per patient per day, and normalized alert rate of 2.41 (95% CI, 2.31-2.51). Stratified analysis by hospital revealed that PPV was highest at the Rochester site, reaching 54.9% (95% CI, 52.9%-57.0%) and outperforming the EPIC deterioration index by 46% or a factor of 6 (7.57%).

CONCLUSION: Achieving a high PPV is crucial because it ensures a larger proportion of alerts are true positives, reducing the burden of false alarms. The considerable improvement in results comes from the novel 2-window feature extraction method. This technique enables the model to capture both long-term trends and recent changes in patient status, enhancing predictive performance.

PMID:41036430 | PMC:PMC12482306 | DOI:10.1016/j.mayocpiqo.2025.100663

Categories
Nevin Manimala Statistics

SWCRTsimulator: A simulation-based platform for power estimation in stepped wedge cluster randomized trials with interval-censored outcomes

SoftwareX. 2025 Sep;31:102288. doi: 10.1016/j.softx.2025.102288. Epub 2025 Aug 7.

ABSTRACT

Stepped wedge cluster randomized trials (SWCRTs) have become increasingly popular across various disciplines, particularly in public health and clinical research, as they allow evaluations of interventions rolled out sequentially across clusters. SWCRTsimulator is a user-friendly, web-based RShiny application designed to facilitate sample size and statistical power estimation for an interval-censored time-to-event outcome in a SWCRT. Leveraging Monte Carlo simulations, the platform accommodates various study design features, including heterogeneity in intervention effect across different clusters, to provide a more accurate and reliable statistical approach to sample size and power estimates as compared to the approximate methods based on study design features when a closed-form solution is not feasible. SWCRTsimulator provides customizable visualizations for simulation results. We also illustrate the practical application of this platform using the Sankofa 2 trial, an active multi-clinic SWCRT of a pediatric HIV disclosure intervention in Ghana, underscoring the importance of accounting for real-world complexities in the design and analysis of such trials.

PMID:41036414 | PMC:PMC12483528 | DOI:10.1016/j.softx.2025.102288

Categories
Nevin Manimala Statistics

Evaluating motivational interview quality using large language models and hidden Markov models

BMC Psychiatry. 2025 Oct 1;25(1):908. doi: 10.1186/s12888-025-07391-1.

ABSTRACT

BACKGROUND: Motivational Interviewing (MI) is a counseling approach that promotes behavior change by eliciting “change talk” and minimizing “sustain talk.” Traditional methods for assessing MI quality, such as manual coding, are labor-intensive, subjective, and difficult to scale. This study introduces an automated framework integrating large language models (LLMs) and Hidden Markov Models (HMMs) for evaluation of MI session quality.

AIMS: This study evaluates the effectiveness of an LLM-HMM framework in predicting MI session quality and examines motivational state transitions in high- and low-quality sessions.

METHOD: A dataset of 40 MI sessions was analyzed. Client utterances were classified and numerically scored by an LLM based on their intention toward or away from change. With HMMs, we used these scores to examine the motivational state transitions across each session. Differences between high- and low-quality sessions were quantified by comparing transition matrices using Frobenius norms. Statistical significance was assessed via a permutation test. Predictive performance was evaluated using logistic regression with leave-one-out cross-validation (LOOCV), where transition matrix elements served as independent variables and interview quality as the dependent variable.

RESULTS: High-quality MI sessions exhibited fluid transitions between motivational states, whereas low-quality sessions showed persistence in resistance-oriented states. A statistically significant difference in transition matrices was observed between session groups (p < 0.001). The framework achieved a mean LOOCV accuracy of 0.80, demonstrating strong predictive performance in identifying MI session quality.

CONCLUSIONS: This study presents a scalable, objective alternative to manual MI evaluation. Future applications may include real-time therapist support, training, and prognosis prediction, pending further validation on field-collected data.

PMID:41034852 | DOI:10.1186/s12888-025-07391-1

Categories
Nevin Manimala Statistics

Exploring radiation-free scoliosis monitoring: systematic review and meta-analysis of non-ionizing methods

BMC Musculoskelet Disord. 2025 Oct 1;26(1):899. doi: 10.1186/s12891-025-09034-8.

ABSTRACT

BACKGROUND: Idiopathic scoliosis is a three-dimensional spinal deformity that typically develops during childhood or adolescence but may affect individuals across the lifespan. Regular monitoring is often necessary to detect progression and assess treatment effectiveness. Radiography remains the clinical gold standard; however, repeated ionizing radiation exposure is associated with increased cancer risks, highlighting the need for reliable, non-invasive, and radiation-free assessment methods. This systematic review and meta-analysis evaluated the diagnostic accuracy and criterion validity of emerging radiation-free scoliosis monitoring techniques compared to radiographic standards.

METHODS: A comprehensive literature search across six databases (Cochrane, EMBASE, IEEE Xplore, PUBMED, Scopus, Web of Science) identified 56 eligible studies involving 4,774 patients diagnosed with idiopathic scoliosis (median number of patients per study: 38; range: 5 to 952, mean patient age: 15.2 years, female-to-male ratio: 3:1). Criterion validity was assessed by pooling Pearson correlation coefficients between radiographic and non-ionizing measurements. Measurement accuracy was assessed by pooling their mean absolute differences in Cobb angles. Additionally, sensitivity and specificity for detecting deformity progression were assessed. Statistical analyses employed multilevel linear mixed-effects models, introducing moderators to explain study heterogeneity.

RESULTS: Ultrasonography demonstrated the highest overall validity, consistently correlating strongly (r≈0.9) with radiographic Cobb angles. Surface topography also showed robust correlation (r > 0.8), although evidence remains insufficient for patients with higher body mass indices or more severe spinal curvatures for both methods. Magnetic resonance imaging exhibited a very strong correlation (r = 0.93) with radiographic measurements; however, correlation varied significantly depending on patient positioning. Upright MRI provided more consistent results compared to supine positioning.

CONCLUSIONS: Ultrasonography and surface topography represent promising radiation-free alternatives that could significantly reduce radiographic assessment frequency, minimizing radiation exposure, particularly in suitable patient groups. While MRI also demonstrates excellent validity, its broader clinical applicability remains constrained by substantial costs, limited availability, and extended examination durations. Although these non-ionizing modalities are not yet viable replacements for routine radiography, their demonstrated validity and accuracy supports their potential as complementary technologies, particularly for screening or supplementary monitoring of scoliosis.

PMID:41034849 | DOI:10.1186/s12891-025-09034-8

Categories
Nevin Manimala Statistics

The SUGAR handshake intervention to prevent hypoglycaemia in elderly people with type 2 diabetes: process evaluation within a pragmatic randomised controlled trial

BMC Geriatr. 2025 Oct 1;25(1):753. doi: 10.1186/s12877-025-06361-2.

ABSTRACT

BACKGROUND: The SUGAR Handshake is a pharmacist-led educational intervention to prevent hypoglycaemia in elderly people with type 2 diabetes mellitus (T2DM). A process evaluation was conducted alongside the ROSE-ADAM pragmatic randomized controlled trial (RCT) to assess the implementation of the intervention and study procedures, explore mechanisms of impact, and examine future scalability.

METHODS: This mixed-methods process evaluation was nested within a single-centre RCT conducted at outpatient clinics in a Jordanian hospital. Routine monitoring quantitative data assessed adherence to the intervention components and study activities, and estimated reach. Qualitative data, collected through semi-structured interviews with 12 purposively selected participants on Days 45 and 90 of enrolment, captured experiences with the intervention and usual care. Thematic analysis was used for qualitative data; descriptive statistics and inferential tests were applied to quantitative data.

RESULTS: The intervention was well implemented: 104 of 106 participants (98.11%) continued the full intervention, with a 100% reach to those enrolled in the trial. Participants showed high adherence to study activities (mean ± SD: 88.07 ± 9.33 documented days on diaries; 77.97 ± 18.87 fasting blood glucose measurements). Intervention reach was 100%. Participants described the intervention as informative, easy to follow, and helpful in avoiding hypoglycaemia and the side-effects of antidiabetic medications. Key facilitators included trust in pharmacists, altruism, and social support. Reported barriers were people’s health status, age-related conditions, and stress.

CONCLUSIONS: This process evaluation highlights the SUGAR Handshake’s potential for broader implementation and scale-up. By addressing identified barriers, future educational interventions may enhance adherence, improve patient outcomes, and advance hypoglycaemia management in diabetes care.

TRIAL REGISTRATION: Clinicaltrials.gov (NCT04081766), registration date 4,920,219.

PMID:41034834 | DOI:10.1186/s12877-025-06361-2

Categories
Nevin Manimala Statistics

Impact of sarcopenia on the clinical efficacy of delta large-channel endoscopic treatment of lumbar spinal stenosis in older adults: a retrospective cohort study

BMC Musculoskelet Disord. 2025 Oct 1;26(1):904. doi: 10.1186/s12891-025-09129-2.

ABSTRACT

BACKGROUND: Exploring the effect of sarcopenia on the clinical outcome of delta large-channel endoscopic treatment of elderly patients with lumbar spinal stenosis.

METHODS: Data were collected from 87 patients who underwent delta large-channel endoscopy between January 2022 and June 2023 at the First Affiliated Hospital of Ningbo University. Skeletal muscle index at the L3 level SMI < 36 cm2/m2 (males) and SMI < 29 cm2/m2 (females) were used as diagnostic thresholds for sarcopenia. We divided patients who met the inclusion criteria into a sarcopenia group (41) and a non-sarcopenia group (46). Patients’ age, gender, BMI, responsible segment, procedure-related parameters (intraoperative bleeding, operative time, hospitalization time, and complication occurrence), and clinical outcomes (Visual Analog Scale for Pain (VAS) scores, JOA scores, Oswestry Dysfunction Index (ODI) scores, and MacNab scores at the time of final follow-up) were recorded and compared.

RESULTS: There were no significant differences in gender, age, BMI, intraoperative bleeding, operative time, hospitalization time, and complication occurrence between the two groups (P > 0.05). Surgery was successfully completed in both groups. Clinical outcomes, such as lumbar VAS scores was not significant different between the two groups (P > 0.05). While comparing the lumbar VAS scores between the two groups at 6 months and 12 months postoperatively, the scores of the non- sarcopenia group were lower than those of the sarcopenia group, and the difference was statistically significant (P < 0.05). In addition, in the postoperative follow-up at 3 months, 6 months and 12 months, the comparison of ODI scores and JOA scores between the sarcopenia group and the non-sarcopenia group was statistically significant (P < 0.05), in which the ODI scores of the non-sarcopenia group were significantly lower than those of the sarcopenia group, and the JOA scores of the non-sarcopenia group were significantly higher than those of the sarcopenia group.

CONCLUSION: Functional recovery after delta large-channel endoscopic decompression was better in non-sarcopenia patients than in the sarcopenia group, and sarcopenia had a greater impact on long-term postoperative outcomes in older patients. We need to emphasize the diagnosis and intervention of sarcopenia in patients to reduce the impact of sarcopenia on postoperative clinical outcomes. Whether sarcopenia affects the stability of lumbar spine in endoscopic patients requires a longer follow-up period and later studies.

PMID:41034829 | DOI:10.1186/s12891-025-09129-2

Categories
Nevin Manimala Statistics

Associations between family functioning, psychological resilience, and emotional competence among primary and secondary school students in Chengdu, Sichuan Province: an exploratory study using structural equation modeling

BMC Public Health. 2025 Oct 1;25(1):3278. doi: 10.1186/s12889-025-24539-6.

ABSTRACT

BACKGROUND: In recent years, with rapid societal changes and increasing educational pressures, the mental health of primary and secondary school students has garnered significant attention. Psychological resilience, as a core capacity for coping with adversity, and emotional competence, as a foundation for emotional regulation in social adaptation, are crucial for student development, with family functioning being a primary environmental factor closely associated with them. Research suggests that healthy family functioning may be associated with higher psychological resilience and emotional competence, while family dysfunction may be linked to increased psychological distress. However, the interconnected mechanisms among family functioning, psychological resilience, and emotional competence, as well as the roles of factors such as gender, urban-rural differences, and grade level, still require further exploration.

OBJECTIVES: To explore the associations between family functioning, psychological resilience, and emotional competence among primary and secondary school students in Chengdu, Sichuan Province, and their underlying mechanisms. The study aims to provide a scientific basis for educators and parents to develop targeted mental health interventions.

DESIGN: Multicenter cross-sectional study.

METHODS: A cluster sampling method was employed to survey 7,937 students from grades 1 to 9 across five schools in Chengdu. Data were collected using the Chinese Family Assessment Instrument (C-FAI, assessing family mutual support, communication, and conflict harmony), the Resilience Subscale and Emotional Competence Subscale of the Chinese Positive Youth Development Scale (CPYDS, measuring adaptation and recovery under stress, and the ability to perceive, understand, and manage emotions, respectively). Data were double-entered and verified using Epidata 3.1. SPSS 26.0 was used for descriptive statistics, correlation analysis, and difference tests (independent samples t-test, Welch t-test, one-way ANOVA, or Welch ANOVA based on data distribution, with Games-Howell post-hoc tests). Partial correlation analysis controlled for gender, urban/rural residence, and grade. Structural equation modeling was conducted using AMOS 26.0 to analyze the associations and mediating effects among family functioning, psychological resilience, and emotional competence, and to evaluate model fit. Harman’ s single-factor test was applied to detect common method bias. The significance level was set at α=0.05.

RESULTS: Family functioning showed significant differences across gender, urban/rural location, and grade level (P<0.05): male students (1.97±0.74), rural students (1.97±0.73), and students in grades 7-9 reported more severe family dysfunction. Psychological resilience was significantly positively correlated with emotional competence (r=0.646,P<0.001), and both were negatively correlated with family dysfunction (r=-0.394 and r=-0.376, respectively, P<0.001). The structural equation model demonstrated a good fit (CMIN/DF=6.988,RMSEA=0.027). Path analysis from the model indicated that psychological resilience may be indirectly associated with emotional competence through family functioning. The mediating effect of this path was 0.089, accounting for 9.2% of the total effect (95% CI: 0.667-0.726, P<0.001).

CONCLUSIONS: Family dysfunction is significantly negatively correlated with the psychological resilience and emotional competence of primary and secondary school students, with psychological resilience indirectly associated with emotional competence through family functioning. Boys, rural students, and those in grades 7-9 exhibit more severe family dysfunction, warranting focused attention. It is recommended to implement interventions such as “Parent-Child Co-Creation Day,” communication training, and “Father’ s Role Workshop” to optimize family functioning, thereby supporting the mental health of primary and secondary school students.

PMID:41034824 | DOI:10.1186/s12889-025-24539-6

Categories
Nevin Manimala Statistics

PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis

BMC Med Res Methodol. 2025 Oct 1;25(1):225. doi: 10.1186/s12874-025-02667-2.

ABSTRACT

The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.

PMID:41034819 | DOI:10.1186/s12874-025-02667-2