Categories
Nevin Manimala Statistics

Teaching Clinical Reasoning in Health Care Professions Learners Using AI-Generated Script Concordance Tests: Mixed Methods Formative Evaluation

JMIR Form Res. 2025 Nov 20;9:e76618. doi: 10.2196/76618.

ABSTRACT

BACKGROUND: The integration of artificial intelligence (AI) in medical education is evolving, offering new tools to enhance teaching and assessment. Among these, script concordance tests (SCTs) are well-suited to evaluate clinical reasoning in contexts of uncertainty. Traditionally, SCTs require expert panels for scoring and feedback, which can be resource-intensive. Recent advances in generative AI, particularly large language models (LLMs), suggest the possibility of replacing human experts with simulated ones, though this potential remains underexplored.

OBJECTIVE: This study aimed to evaluate whether LLMs can effectively simulate expert judgment in SCTs by using generative AI to author, score, and provide feedback for SCTs in cardiology and pneumology. A secondary objective was to assess students’ perceptions of the test’s difficulty and the pedagogical value of AI-generated feedback.

METHODS: A cross-sectional, mixed methods study was conducted with 25 second-year medical students who completed a 32-item SCT authored by ChatGPT-4o (OpenAI). Six LLMs (3 trained on the course material and 3 untrained) served as simulated experts to generate scoring keys and feedback. Students answered SCT questions, rated perceived difficulty, and selected the most helpful feedback explanation for each item. Quantitative analysis included scoring, difficulty ratings, and correlations between student and AI responses. Qualitative comments were thematically analyzed.

RESULTS: The average student score was 22.8 out of 32 (SD 1.6), with scores ranging from 19.75 to 26.75. Trained AI systems showed significantly higher concordance with student responses (ρ=0.64) than untrained models (ρ=0.41). AI-generated feedback was rated as most helpful in 62.5% of cases, especially when provided by trained models. The SCT demonstrated good internal consistency (Cronbach α=0.76), and students reported moderate perceived difficulty (mean 3.7, SD 1.1). Qualitative feedback highlighted appreciation for SCTs as reflective tools, while recommending clearer guidance on Likert-scale use and more contextual detail in vignettes.

CONCLUSIONS: This is among the first studies to demonstrate that trained generative AI models can reliably simulate expert clinical reasoning within a script-concordance framework. The findings suggest that AI can both streamline SCT design and offer educationally valuable feedback without compromising authenticity. Future studies should explore longitudinal effects on learning and assess how hybrid models (human and AI) can optimize reasoning instruction in medical education.

PMID:41264864 | DOI:10.2196/76618

Categories
Nevin Manimala Statistics

Impact of Patient Suicide on Mental Health Professionals

Prim Care Companion CNS Disord. 2025 Nov 20;27(6):25m03995. doi: 10.4088/PCC.25m03995.

ABSTRACT

Objective: To explore the effect of a patient’s suicide on mental health professionals (MHPs), the perceived psychological and professional impacts, the support MHPs require versus actually receive, and their views on training that is provided to cope with such incidents.

Methods: A mixed-methods approach was used. An online survey was conducted from September to October 2023. The validated semistructured questionnaire was open for 8 weeks and covered demographics, details of incidents, emotional and professional impacts, and support systems. Responses were analyzed using descriptive statistics and thematic analysis to derive insights from qualitative data.

Results: Among 96 responses, 51% had treated patients who died by suicide. These patients were mostly males, primarily diagnosed with psychotic or affective disorders. Of the MHP respondents, 76.6% experienced suicide of a patient after completing their training. Around one-third reported moderate-to-extreme emotional impact of the incident, with sadness, regret, and guilt being common responses. Support-seeking behaviors were common with 52.2% of respondents finding support from colleagues, family, or professional communities helpful, but formal training on managing patient suicide was found to be lacking.

Conclusion: Patient suicide can impact MHPs, affecting emotional well-being, professional identity, and personal life, emphasizing the importance of establishing a supportive environment, incorporating enhanced training into psychiatry programs, and encouraging open dialog.

Prim Care Companion CNS Disord 2025;27(6):25m03995.

Author affiliations are listed at the end of this article.

PMID:41264862 | DOI:10.4088/PCC.25m03995

Categories
Nevin Manimala Statistics

AI-Generated “Slop” in Online Biomedical Science Educational Videos: Mixed Methods Study of Prevalence, Characteristics, and Hazards to Learners and Teachers

JMIR Med Educ. 2025 Nov 20;11:e80084. doi: 10.2196/80084.

ABSTRACT

BACKGROUND: Video-sharing sites such as YouTube (Google) and TikTok (ByteDance) have become indispensable resources for learners and educators. The recent growth in generative artificial intelligence (AI) tools, however, has resulted in low-quality, AI-generated material (commonly called “slop”) cluttering these platforms and competing with authoritative educational materials. The extent to which slop has polluted science education video content is unknown, as are the specific hazards to learning from purportedly educational videos made by AI without the use of human discretion.

OBJECTIVE: This study aimed to advance a formal definition of slop (based on the recent theoretical construct of “careless speech”), to identify its qualitative characteristics that may be problematic for learners, and to gauge its prevalence among preclinical biomedical science (medical biochemistry and cell biology) videos on YouTube and TikTok. We also examined whether any quantitative features of video metadata correlate with the presence of slop.

METHODS: An automated search of publicly available YouTube and TikTok videos related to 10 search terms was conducted in February and March 2025. After exclusion of duplicates, off-topic, and non-English results, videos were screened, and those suggestive of AI were flagged. The flagged videos were subject to a 2-stage qualitative content analysis to identify and code problematic features before an assignment of “slop” was made. Quantitative viewership data on all videos in the study were scraped using automated tools and compared between slop videos and the overall population.

RESULTS: We define “slop” according to the degree of human care in production. Of 1082 videos screened (814 YouTube, 268 TikTok), 57 (5.3%) were deemed probably AI-generated and low-quality. From qualitative analysis of these and 6 additional AI-generated videos, we identified 16 codes for problematic aspects of the videos as related to their format or contents. These codes were then mapped to the 7 characteristics of careless speech identified earlier. Analysis of view, like, and comment rates revealed no significant difference between slop videos and the overall population.

CONCLUSIONS: We find slop to be not especially prevalent on YouTube and TikTok at this time. These videos have comparable viewership statistics to the overall population, although the small dataset suggests this finding should be interpreted with caution. From the slop videos that were identified, several features inconsistent with best practices in multimedia instruction were defined. Our findings should inform learners seeking to avoid low-quality material on video-sharing sites and suggest pitfalls for instructors to avoid when making high-quality educational materials with generative AI.

PMID:41264860 | DOI:10.2196/80084

Categories
Nevin Manimala Statistics

Web-Based AI-Driven Virtual Patient Simulator Versus Actor-Based Simulation for Teaching Consultation Skills: Multicenter Randomized Crossover Study

JMIR Form Res. 2025 Nov 20;9:e71667. doi: 10.2196/71667.

ABSTRACT

BACKGROUND: There is a need to increase health care professional training capacity to meet global needs by 2030. Effective communication is essential for delivering safe and effective patient care. Artificial intelligence (AI) technologies may provide a solution. However, evidence for high-fidelity virtual patient simulators using unrestricted 2-way verbal conversation for communication skills training is lacking.

OBJECTIVE: This study aims to compare a fully automated AI-driven voice recognition-based virtual patient simulator with traditional actor-based consultation skills simulated training in undergraduate medical students for differences in developing self-rated communication skills, student satisfaction scores, and direct cost comparison.

METHODS: Using an open-label randomized crossover design, a single web-based AI-driven communication skills training session (AI-CST) was compared with a single face-to-face actor-based consultation skills training session (AB-CST) in undergraduates at 2 UK medical schools. Offline total cohort recruitment was used, with an opt-out option. Pre-post intervention surveys using 10-point linear scales were used to derive outcomes. The primary outcome was the difference in self-reported attainment of communication skills between interventions. Secondary outcomes were differences in student satisfaction and the cost comparison of delivering both interventions.

RESULTS: Of 396 students, 378 (95%) completed at least 1 survey. Both modalities significantly increased self-reported communication skills attainment (AI-CST: mean difference 1.14, 95% CI 0.97-1.32 points; AB-CST: mean difference 1.50, 95% CI 1.35-1.66 points; both P<.001). Attainment increase was lower for AI-CST than AB-CST (by mean difference 0.36, 95% CI -0.60 to -0.13 points; P=.04). Overall satisfaction was lower for AI-CST than AB-CST (8.09 vs 9.21; mean difference -1.13, 95% CI -1.33 to -0.92 for AI-CST vs AB-CST; P<.001). The estimated costs of AI-CST and AB-CST were £33.48 (US $42.22) and £61.75 (US $77.87) per student, respectively.

CONCLUSIONS: AI-CST and AB-CST were both effective at improving self-reported communication skills attainment, but AI-CST was slightly inferior to AB-CST. Student satisfaction was significantly greater for AB-CST. Costs of AI-CST were substantially lower than AB-CST. AI-CST may provide a cost-effective opportunity to build training capacity for health care professionals.

PMID:41264856 | DOI:10.2196/71667

Categories
Nevin Manimala Statistics

Additive Benefits of Control-IQ+ AID to GLP-1 Receptor Agonist Use in Adults With Type 2 Diabetes

Diabetes Care. 2025 Dec 1;48(12):2154-2159. doi: 10.2337/dc25-1753.

ABSTRACT

OBJECTIVE: To assess the effect of automated insulin delivery (AID) on glycemic and insulin outcomes in adults with insulin-treated type 2 diabetes using a glucagon-like peptide-1 receptor agonist (GLP-1 RA).

RESEARCH DESIGN AND METHODS: In a randomized trial comparing Control-IQ+ AID versus continuation of prestudy insulin delivery method plus continuous glucose monitoring (CGM group), 143 (45%) of the 319 participants were using a GLP-1 RA at baseline, which was continued during the trial.

RESULTS: Among GLP-1 RA users, mean HbA1c decreased by 0.8% from a baseline of 8.0 ± 1.2% with AID, which represented a mean improvement of -0.5% (95% CI -0.8 to -0.3, P < 0.001) compared with the CGM group. Time-in-range 70-180 mg/dL and other CGM metrics reflective of hyperglycemia also showed comparable statistically significant improvements using AID when added to GLP-1 RA use. For GLP-1 RA users, there was no significant difference in weight after 13 weeks with AID compared with the CGM group (0.9 kg, 95% CI -0.2 to 2.1, P = 0.10), whereas, in GLP-1 RA nonusers, there was a mean weight gain of 1.9 kg with AID compared with CGM (95% CI 0.5 to 3.2, P = 0.007).

CONCLUSIONS: The benefits of AID appear to be substantial for a broad spectrum of insulin-treated patients with type 2 diabetes, including those already receiving contemporary and guideline-directed therapy, such as a GLP-1 RA medication. These additive benefits of AID in GLP-1 RA users included significant reductions in HbA1c levels with simultaneous reduction in insulin use, along with no statistical increase in weight despite very significant improvements in glycemic control.

PMID:41264828 | DOI:10.2337/dc25-1753

Categories
Nevin Manimala Statistics

Misclassified latent autoimmune diabetes in adults within Māori and Pacific adults with type 2 diabetes in Aotearoa New Zealand

N Z Med J. 2025 Nov 21;138(1626):49-61. doi: 10.26635/6965.6989.

ABSTRACT

AIM: We investigated Māori and Pacific adults with type 2 diabetes (T2D) to determine the prevalence of latent autoimmune diabetes in adults (LADA), assess the type 1 diabetes (T1D) genetic risk score (GRS) distribution in those with and without autoantibodies and investigate differences in clinical diabetes characteristics based on autoantibody presence or a high T1D GRS.

METHOD: A total of 2,538 Māori and Pacific participants from the Genetics of Gout, Diabetes, and Kidney Disease study in Aotearoa New Zealand were included (830 with T2D, 1,708 without). LADA was defined as age of diabetes onset >30 years, presence of autoantibodies and no insulin treatment within the first 6 months. Clinical characteristics were extracted from medical records. T1D-associated autoantibodies (glutamic acid decarboxylase, islet antigen 2, zinc transporter 8) were measured from stored blood samples from 293 participants (262 T2D, 31 without). A T1D GRS consisting of 30 single-nucleotide polymorphisms was calculated for all participants.

RESULTS: Autoantibodies were detected in 8.8% (23/262) of individuals with T2D, with 5.3% (14/262) meeting the criteria for LADA. No significant difference in T1D GRS or clinical characteristics was observed between T2D cases with and without autoantibodies. Autoantibodies were also detected in 3.2% (1/31) of participants without diabetes.

CONCLUSION: LADA is present in a subset of Māori and Pacific individuals with T2D. Autoantibody presence was not associated with differences in T1D GRS or clinical features. Further research is needed to assess whether C-peptide monitoring could guide treatment decisions in those with LADA.

PMID:41264820 | DOI:10.26635/6965.6989

Categories
Nevin Manimala Statistics

B4 School Check hearing screening and middle ear disease: a five-year analysis of prevalence and inequity

N Z Med J. 2025 Nov 21;138(1626):26-34. doi: 10.26635/6965.7137.

ABSTRACT

AIM: The B4 School Check includes hearing screening of four-year-old children in Aotearoa New Zealand. This study describes the prevalence and distribution of hearing loss, likely due to otitis media with effusion (OME), to determine if there is inequity in access to screening and primary healthcare, and to inform programme design and delivery.

METHOD: Hearing data over a five-year period were linked with demographic data and interrogated using regression analyses for differences in disease burden, access to screening and to primary healthcare.

RESULTS: Māori and Pacific children and those living with higher deprivation were less likely to be screened. When screened these children had higher rates of disease, were less likely to be referred immediately and had poorer access to primary healthcare to enable appropriate management.

CONCLUSION: The current delivery of hearing screening is inequitable, missing those that need it most and exacerbating an uneven distribution of disease burden. A redeveloped programme to enable identification and screening of all eligible children, differential delivery according to need and a more holistic provision of care is required. This includes support for speech and language concerns, ear health promotion and linkage with primary care and healthy housing programmes.

PMID:41264818 | DOI:10.26635/6965.7137

Categories
Nevin Manimala Statistics

Smartphones and Mental Health Awareness and Utilization in a Low-Income Urban Community: Focus Group Study

JMIR Form Res. 2025 Nov 20;9:e65650. doi: 10.2196/65650.

ABSTRACT

BACKGROUND: Mental health disorders pose a significant challenge in low- and middle-income countries (LMICs), contributing substantially to the global disease burden. Despite the high prevalence of these disorders, LMICs allocate less than 1% of health budgets to mental health, resulting in inadequate care and a severe shortage of professionals. Stigma and cultural misconceptions further hinder access to mental health services. These challenges are present in Bangladesh, with high prevalence rates of depression and anxiety, as well as a centralized and underresourced mental health care system. Digital tools, such as smartphone apps and online platforms, offer innovative solutions to these challenges by increasing accessibility, cost-effectiveness, and scalability of mental health interventions.

OBJECTIVE: This study aims to characterize the views around digital tools for mental health among residents of Korail (a major slum in Dhaka, Bangladesh), including the use of smartphones, and investigate acceptable digital tools and barriers and facilitators for digital mental health tools.

METHODS: A total of 8 focus group discussions were conducted with 38 participants, including individuals with serious mental disorders and their caregivers. The focus group discussions were guided using a semistructured topic guide, which included broad questions on smartphone usage to contextualize digital access, primarily focusing on perceptions of using mobile technology for mental health care. Focus groups were held in Bangla, audio recorded, and transcribed and translated in English. Data were analyzed using thematic analysis in NVivo 14.

RESULTS: Participants (mean age 37 y, SD 13.7) were mostly female (30/38, 79%), and 45% (17/38) personally owned smartphones, although 92% (35/38) reported smartphone access within the household. The findings revealed a general lack of awareness and understanding of digital mental health tools among slum residents. However, there was a notable appetite for these tools; participants recognized their potential to provide timely and cost-effective support, reduce hospital visits, and make health care more accessible. Participants highlighted the convenience and communication benefits of smartphones but expressed concerns about misuse such as excessive use, particularly among adolescents. Barriers to the utilization of digital mental health tools included limited technological literacy and accessibility issues. Despite these challenges, participants acknowledged the potential of these tools to bridge the gap in mental health services, especially for those unable to travel. The importance of providing proper guidance and education to maximize the effectiveness of digital tools was emphasized.

CONCLUSIONS: Digital mental health tools hold promise for improving mental health care in underserved slum communities. This study underscores the need for further research and investment in tailored digital mental health solutions to address the unique needs of slum populations in LMICs.

PMID:41264814 | DOI:10.2196/65650

Categories
Nevin Manimala Statistics

Test-Retest Reliability of the Blast Exposure Threshold Survey in United States Service Members and Veterans

J Neurotrauma. 2025 Nov 13. doi: 10.1177/08977151251394741. Online ahead of print.

ABSTRACT

The Blast Exposure Threshold Survey (BETS) is a recently developed measure of lifetime blast exposure. Although promising, it is considered a fundamental tenet to establish that the BETS (and other measures like it) have good psychometric properties before it can be recommended for clinical use. The purpose of this study was to examine the test-retest reliability of the BETS in a military sample. Participants were 83 United States service members and veterans prospectively recruited from three military medical treatment facilities and from the community. Participants were classified into two broad groups as part of a larger study: traumatic brain injury (TBI; n = 41; mild-severe TBI) and controls (n = 42; injured and non-injured controls). Participants completed the BETS, Neurobehavioral Symptom Inventory, and a brief structured interview to gather basic demographic, military, and injury-related information (e.g., age, education, deployments, etc.). In addition, participants completed the BETS on a second occasion (T2) 3 weeks following the first administration (T1). Using Spearman rho correlation analyses, the test-retest reliability of the BETS Generalized Blast Exposure Value (GBEV) was classified as “acceptable” (r = 0.76). However, when comparing individual responses across T1 and T2, 33% of the sample reported significant inconsistencies in the endorsement of the five weapons categories. The most problematic inconsistency (∼10% of the sample) related to the failure of some participants to consistently endorse, or not endorse, exposure to a weapons category at T1 and T2 (e.g., T1 = exposure present; T2 = exposure absent). Less problematic, but also of concern, was the failure of some participants (∼23%) to consistently report the same number of years in which they were exposed to a weapons category from T1 and T2 (e.g., T1 = 10 years; T2 = 5 years). Factors associated with inconsistent reporting from T1 to T2 included higher GBEV scores, older age, higher number of years in the military, higher number of deployments, and higher blast exposure. This is one of the first studies to comprehensively examine the test-retest reliability of the BETS GBEV. Overall, the test-retest reliability of the GBEV was considered statistically acceptable and provides support for the use of the GBEV in both clinical and research settings. Concerningly, however, substantial inconsistencies were found in the basic reporting of weapons exposure in 33% of the sample that need to be addressed. Future researchers should identify ways to improve the BETS to increase response consistency over time.

PMID:41264354 | DOI:10.1177/08977151251394741

Categories
Nevin Manimala Statistics

Three Body-Worn Accelerometers in the French NutriNet-Santé Cohort: Feasibility and Acceptability Study

JMIR Form Res. 2025 Nov 20;9:e76167. doi: 10.2196/76167.

ABSTRACT

BACKGROUND: Accurate assessment of physical activity (PA) in large population-based cohorts remains a major methodological challenge. Self-reported questionnaires, although commonly used due to low cost and simplicity, are prone to recall and social desirability biases, causing misclassification and weakened associations with health outcomes. Body-worn accelerometers provide more objective and reliable measurements, but their acceptability and feasibility in large-scale epidemiological studies must be carefully evaluated to ensure compliance, data quality, and scalability.

OBJECTIVE: The primary objective was to assess the acceptability of using 3 body-worn accelerometer devices (Fitbit, ActivPAL, and ActiGraph) among healthy middle-aged adults participating in the NutriNet-Santé cohort. The secondary objective was to assess the feasibility of these devices in terms of wear-time compliance under free-living conditions.

METHODS: This is an ancillary study of the European WEALTH (WEarable sensor Assessment of physicaL and eaTing beHaviors) project that was conducted between 2023 and 2024 in a subsample of participants of the NutriNet-Santé surveillance in France. This sample included 126 healthy participants (62 men), with a mean age of 46.3 (SD 11.3) years. Participants wore simultaneously 3 body-worn accelerometer devices (Fitbit [wrist], ActivPAL [thigh], and ActiGraph [waist]) for 7 consecutive days. After the wear period, participants completed a specific 22-item web-based questionnaire, regarding their acceptability of using each device. This questionnaire was based on the Technology Acceptance Model, which identifies perceived usefulness and ease of use as key determinants of technology acceptance. Items were rated on a 5-point Likert scale (1=strongly disagree to 5=strongly agree). Feasibility was assessed based on the accelerometer wear time data reported in a log diary by participants. A valid day was defined as ≥600 minutes per day of wear time, and a valid week as at least 4 of such days. Acceptability scores were compared between devices using ANOVA, and feasibility outcomes were compared using Kruskal-Wallis tests.

RESULTS: The acceptability assessment based on the questionnaire revealed significant differences among the 3 devices (P<.001). The Fitbit achieved the highest acceptability score (mean 80.5/100, SD 8.13) across most criteria such as comfort, ease of use, and social acceptability, while the ActiGraph received the lowest score (mean 71.7, SD 8.68), mainly due to challenges with stability and interference during PA. In terms of feasibility, the 3 accelerometers demonstrated high compliance, with the ActivPAL recording the highest daily wear time, followed by the Fitbit and the ActiGraph (P<.001).

CONCLUSIONS: Results from our study showed that the Fitbit watch appears as the most accepted device for measuring PA in free-living conditions in the NutriNet-Santé study. The large-scale use of such a device must now be evaluated in terms of logistics, cost, and data privacy.

PMID:41264348 | DOI:10.2196/76167