Categories
Nevin Manimala Statistics

Comparative Evaluation of Advanced Reasoning Models for Clinical Decision Support in Urology

Urol Int. 2026 Mar 23:1-13. doi: 10.1159/000551610. Online ahead of print.

ABSTRACT

OBJECTIVE: To compare the performance of five advanced reasoning models on urology-related clinical multiple-choice questions from the MedQA dataset, and to benchmark AI performance against medical students and experienced urologists in terms of accuracy, response efficiency, and agreement patterns.

METHODS: We extracted 434 urology-relevant items and evaluated five models-DeepSeek-R1, ChatGPT O4-mini, Gemini 2.5 Pro, Claude 3.7 Sonnet, and Grok 3-using a standardized prompt. Accuracy was computed against reference answers; API response times and connection failures were recorded. In addition, 20 senior medical students and 20 experienced urologists answered subsets of the same item bank using a balanced block design; group-level majority-vote answers were used as human baselines. Statistical analyses included Cochran’s Q and McNemar tests (AI-only accuracy), a logistic generalized linear mixed-effects model (GLMM) with urologists as the reference (model-adjusted accuracy), Fleiss’ κ and Cohen’s κ (agreement), and Friedman and Wilcoxon signed-rank tests (response time).

RESULTS: Across the AI-only comparison, all models achieved high accuracy (86.9-93.3%), with DeepSeek-R1, ChatGPT O4-mini, and Gemini 2.5 Pro outperforming Claude 3.7 Sonnet and Grok 3. In the model-adjusted analysis, all five AI models showed significantly higher odds of correct answers than experienced urologists (all p < 0.001, Dunnett-adjusted), while medical students did not differ significantly from urologists. ChatGPT O4-mini had the shortest median API response time (5.03 s), whereas group-level median task completion times were 15.87 s for students and 17.57 s for urologists; Grok 3 was slowest among AI models (27.62 s). Connection failure rates were 0% for ChatGPT O4-mini, Gemini 2.5 Pro, and Claude 3.7 Sonnet; 1.6% for DeepSeek-R1; and 2.8% for Grok 3. Agreement across the five AI models and the two human majority-vote baselines was moderate-to-substantial (Fleiss’ κ = 0.685, p < 0.001).

CONCLUSION: Modern reasoning models achieve strong accuracy and efficiency on urology-focused benchmark questions, supporting their potential role as useful clinical assistants when implemented with appropriate human oversight. ChatGPT O4-mini’s rapid latency further underscores its suitability for time-sensitive workflows, while model-adjusted analyses indicate its consistently superior accuracy relative to experienced urologists within this standardized assessment format.

PMID:41871224 | DOI:10.1159/000551610

Categories
Nevin Manimala Statistics

Clinical judgment: an essential method in medicine

Postgrad Med J. 2026 Mar 23:qgag030. doi: 10.1093/postmj/qgag030. Online ahead of print.

ABSTRACT

Physicians rely on clinical judgment and patients look for it. However, clinical judgment is infrequently discussed in the literature, and is often perceived as an intuitive art, that is likely to be replaced by technology and artificial intelligence. This review offers a reconceptualization of the role of clinical judgment in current medical practice and research, informed by the extensive knowledge that has accumulated in psychosomatic medicine. Clinical judgment consists of three phases: collecting clinical information; interpretation and clinical reasoning; decision making. Interviewing is the primary method for gathering data. Clinical reasoning involves bringing together relevant information and formulating hypotheses, which result in decisions and therapeutic acts. Clinimetrics, the science of clinical measurements, facilitates physician’s reasoning and organization of data. Improving the features of clinical judgment is likely to yield a highly effective precision medicine.

PMID:41871207 | DOI:10.1093/postmj/qgag030

Categories
Nevin Manimala Statistics

Visual Implicit Learning and Speech Recognition in Adult Post-Lingual Cochlear Implant Users

Trends Hear. 2026 Jan-Dec;30:23312165261434604. doi: 10.1177/23312165261434604. Epub 2026 Mar 23.

ABSTRACT

Implicit learning is thought to play an important role in speech recognition under challenging conditions. However, auditory deprivation has been proposed to influence implicit learning, including in the visual modality, although evidence in adults with post-lingual deafness is limited. Therefore, we investigated implicit visual learning and its associations with speech recognition in adults with post-lingual deafness who use cochlear implants (CIs). Thus, this study focuses on the effects of late auditory deprivation rather than on the effects of early deprivation associated with congenital deafness. Adult CI users (n = 30) and a group of individuals with normal hearing (NH, n = 36) completed two implicit visual learning tasks (statistical and perceptual), a battery of challenging speech recognition tests and cognitive measures (vocabulary, working memory, attention, and verbal processing speed). NH listeners demonstrated significant visual statistical learning, whereas CI users showed a similar but nonsignificant pattern. In the visual perceptual learning task, both groups exhibited comparable learning effects. In CI users, visual statistical learning contributed to the recognition of speech in noise (words and sentences). Visual perceptual learning only contributed to the recognition of words in noise. The current findings are inconsistent with the idea that auditory deprivation beyond the sensitive period interferes with visual learning. Rather, in CI users, visual implicit learning contributes to the recognition of challenging speech. Therefore, future work might investigate whether visual learning in CI candidates is predictive of postimplantation milestones.

PMID:41870495 | DOI:10.1177/23312165261434604

Categories
Nevin Manimala Statistics

Rat strain differences in bronchoalveolar lavage fluid and minimal association with histopathology findings

Inhal Toxicol. 2026 Mar 23:1-14. doi: 10.1080/08958378.2026.2644247. Online ahead of print.

ABSTRACT

BACKGROUND: Six years since the revised Test Guidelines 412 and 413 (TG412 and TG413) were issued, there are sufficient data to evaluate the relationship between bronchoalveolar lavage (BAL) cytology and biomarkers with histopathology.

OBJECTIVE: This retrospective study evaluates the correlation between mandatory endpoints in the BAL fluid (LDH activity, concentration of total protein, inflammatory cell counts) and histopathological changes in the lungs following sub-chronic inhalation exposure in rats.

MATERIALS AND METHODS: Twenty-eight studies conducted across two Test Facilities from 2018 to 2023 were reviewed to identify trends.

RESULTS: At baseline, there were no strain differences in BAL fluid total protein, but LDH activity was statistically different between sexes and ages. LDH activity and total protein in BALF at the lowest observed adverse effect concentration showed no pathological pattern following inhalation exposure to the tested chemicals, while immune cell counts shifted in Wistar Han rats. Specifically in studies with adverse lung histopathology, total protein and LDH activity were generally elevated, along with a shift in immune cells toward neutrophils and eosinophils, without correlation to the severity score of adverse microscopic findings.

CONCLUSION: These results suggest that BALF parameters are insufficient to independently characterize adversity but may be used in other ways to progress new approach methods.

PMID:41870483 | DOI:10.1080/08958378.2026.2644247

Categories
Nevin Manimala Statistics

Transesophageal Echocardiography During CPR in Patients With Out-of-Hospital Cardiac Arrest: The EXECT-CPR Randomized Clinical Trial

JAMA Intern Med. 2026 Mar 23. doi: 10.1001/jamainternmed.2026.0102. Online ahead of print.

ABSTRACT

IMPORTANCE: Cardiopulmonary resuscitation (CPR) guidelines recommend chest compressions at the lower half of the sternum. This may lead to aortic valve compression, which is associated with poor outcomes, while compressions over the left ventricle are seldom achieved.

OBJECTIVE: To test the hypothesis that transesophageal echocardiography (TEE) guidance during CPR to avoid aortic valve compression and target the left ventricle would improve outcomes in patients with nontraumatic out-of-hospital cardiac arrest compared with conventional CPR.

DESIGN, SETTING, AND PARTICIPANTS: This cluster-randomized clinical trial (the EXECT-CPR study) was conducted from June 26 to November 19, 2023, at 1 tertiary medical center in Taiwan. Participants were adults who consecutively presented to the emergency department (ED) with nontraumatic out-of-hospital cardiac arrest. Exclusion criteria were prehospital return of spontaneous circulation, extracorporeal CPR, contraindications to TEE, prior do-not-resuscitate orders, and obvious signs of death. Complete blinding was not feasible; the allocation schedule was disclosed only to the principal investigator.

INTERVENTION: Post-ED arrival CPR at TEE-guided (avoid aortic-valve compression and target the left ventricle) or guideline-recommended (the lower half of the sternum) site.

MAIN OUTCOMES AND MEASURES: The primary outcome was a sustained return of spontaneous circulation (≥20 minutes). Secondary outcomes were any return of spontaneous circulation, survival to intensive care unit admission, survival to hospital discharge, cerebral performance category of 2 or lower at discharge, and intra-CPR end-tidal carbon dioxide levels.

RESULTS: A total of 132 patients underwent randomization (66 in each group; median [IQR] age, 68 [55-74] years; 87 [66%] male). The primary outcome was similar between groups (TEE-guided group, 29 [44%]; conventional group, 26 [39%]; cluster-adjusted odds ratio, 1.21; 95% CI, 0.64-2.29). The secondary outcomes also did not significantly differ, except for higher intra-CPR end-tidal carbon dioxide levels in the TEE-guided group during the 11th to 20th minutes after arrival. Adverse event rates related to TEE and CPR were comparable.

CONCLUSIONS AND RELEVANCE: In this randomized clinical trial among adults transported to the emergency department with ongoing CPR for nontraumatic out-of-hospital cardiac arrest, TEE-guided CPR with an adjusted compression site after arrival did not significantly improve clinical outcomes compared with conventional CPR, although it produced potential hemodynamic benefits without increasing adverse events. Given that the trial was underpowered due to optimistic effect size assumptions, these neutral findings should be interpreted with caution.

TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT05907460.

PMID:41870444 | DOI:10.1001/jamainternmed.2026.0102

Categories
Nevin Manimala Statistics

Sleep Health Dimensions From Wearables and Transdiagnostic Mental Health in Young Adolescents

JAMA Pediatr. 2026 Mar 23. doi: 10.1001/jamapediatrics.2026.0335. Online ahead of print.

ABSTRACT

IMPORTANCE: Sleep behavior markedly shifts in adolescence, increasing vulnerability to mental health disorders. Although sleep health is understood to be multidimensional, adolescent-specific sleep health dimensions have not been empirically validated and their relevance to transdiagnostic mental health outcomes is unknown.

OBJECTIVE: To identify sleep health dimensions using Fitbit devices in a large sample of young adolescents and assess concurrent and prospective associations between sleep health dimensions and transdiagnostic mental health outcomes.

DESIGN, SETTING, AND PARTICIPANTS: Multicenter longitudinal cohort study using data from 3393 participants in the Adolescent Brain Cognitive Development (ABCD) Study (Data Release 5.1, collected 2018-2020), including early adolescents (ages 11-13 years) within the US. Exploratory factor analysis (EFA) was used to identify sleep health dimensions and confirmatory factor analysis (CFA) to confirm the factor structure in an independent subsample. Linear mixed-effects models were used to test concurrent and prospective associations between sleep dimensions and mental health outcomes at 1-year follow-up. Statistical analysis was conducted from January to November 2025.

EXPOSURES: Objective sleep data collected for up to 21 (range, 7-21) days, using wearable Fitbit devices.

MAIN OUTCOMES AND MEASURES: Transdiagnostic mental health outcomes assessed via the Child Behavior Checklist and Brief Problem Monitor (internalizing and externalizing symptoms), Prodromal Questionnaire-Brief Child Version (psychoticlike symptoms), and 10-item Mania Scale (mania symptoms).

RESULTS: The 3393 participants (49% female; median age, 12 years) were split into EFA and CFA subsamples. Six sleep factors were identified using EFA: irregularity, timing, social jetlag, duration, weekend oversleep, and continuity. CFA confirmed this factor structure. All variables loaded strongly (≥0.64) onto at least 1 factor (factor 1 loadings, 0.64-0.98; factor 2, 0.96-0.98; factor 3, 0.95-0.97; factor 4, -0.86 to 1.01; factor 5, 0.68-0.93; factor 6, 0.82-0.94). Greater sleep irregularity was associated with transdiagnostic mental health symptoms cross-sectionally, but not prospectively (β, 0.06 [95% CI, 0.02-0.10] to 0.12 [95% CI, 0.08-0.16]). Shorter duration was associated with total, internalizing, externalizing, and attention symptoms cross-sectionally (β, -0.06 [95% CI, -0.10 to -0.01] to -0.11 [95% CI, -0.15 to -0.06]) and total, attention, and psychotic symptoms 1 year later.

CONCLUSIONS AND RELEVANCE: In this study, wearable Fitbit data provide empirical support for multidimensional frameworks of sleep health in adolescence. Although effect sizes were small, sleep irregularity and duration emerged as key dimensions with relevance to mental health. These findings establish a foundation for future investigations, including examining within-person patterns of the 6 dimensions, extending to older adolescence, investigating associations with other health outcomes, replicating with research-grade actigraphy devices, and suggesting potential targets for pediatric sleep interventions.

PMID:41870441 | DOI:10.1001/jamapediatrics.2026.0335

Categories
Nevin Manimala Statistics

Childhood Mortality by Parental Cause of Death

JAMA Netw Open. 2026 Mar 2;9(3):e262790. doi: 10.1001/jamanetworkopen.2026.2790.

NO ABSTRACT

PMID:41870432 | DOI:10.1001/jamanetworkopen.2026.2790

Categories
Nevin Manimala Statistics

Workplace Accommodations and Attrition Among Physicians With Disabilities

JAMA Netw Open. 2026 Mar 2;9(3):e261922. doi: 10.1001/jamanetworkopen.2026.1922.

ABSTRACT

IMPORTANCE: Physicians with disabilities face bias and barriers in the workplace, including stigma, lack of accommodations, and mistreatment, which may contribute to workforce attrition. Given the projected physician shortage and the importance of physicians with disabilities in providing informed and empathetic care, understanding attrition within this group is critical.

OBJECTIVE: To examine the associations among disability, workplace accommodations, and physician workforce attrition, including consideration of leaving medical practice and reductions in clinical hours.

DESIGN, SETTING, AND PARTICIPANTS: This survey study used a cross-sectional design to analyze data from the 2022 National Sample Survey of Physicians. Logistic regression models assessed associations between disability and attrition outcomes, adjusting for demographic and workplace factors. Participants included 5917 active physicians who self-reported personal (eg, disability status) and professional (eg, accommodations) data. Data were collected from May 10 to November 9, 2022, and analyzed from October 1, 2023, to May 1, 2025.

MAIN OUTCOMES AND MEASURES: The primary outcomes were (1) having considered leaving medical practice within the past 12 months, including reasons why, and (2) having ever reduced clinical hours for 6 months or longer. The core independent variable was accommodation status.

RESULTS: Among the 5917 physicians surveyed, 154 (2.6%) reported having a disability. A total of 3707 respondents (62.6%) were men or transgender men and 5620 (95.0%) identified as heterosexual; the mean (SD) age was 53.9 (10.8) years. Fifty-six physicians with disabilities (36.4%) considered leaving the practice of medicine, compared with 1316 of 5600 physicians (23.5%) without disabilities. Sixty-seven physicians with disabilities (43.5%) reported transitioning to part time or pausing their practice at some point, compared with 1327 (23.7%) without disabilities. Multivariate regression analysis found physicians with disabilities were more likely than their peers without disabilities to consider leaving medical practice (odds ratio [OR], 2.22; 95% CI, 1.24-3.96; P = .01) and to have reduced clinical hours or paused practice during their careers (OR, 1.94; 95% CI, 1.09-3.43; P = .02). Burnout was the most common reason among both groups, and physicians with disabilities more frequently cited underlying health conditions (self or family) (32 [52.7%] vs 122 [8.5%]). Among physicians with disabilities, those who received accommodations were significantly less likely than those without accommodations to report an intent to leave (42 of 123 [34.3%] and 13 of 24 [54.2%], respectively).

CONCLUSIONS AND RELEVANCE: In this survey study, physicians with disabilities were significantly more likely to consider leaving the workforce and to reduce clinical hours than their peers without disabilities. Clear, stigma-free disclosure and accommodation processes, along with inclusive workplace cultures, are essential to retaining this vital segment of the physician workforce.

PMID:41870431 | DOI:10.1001/jamanetworkopen.2026.1922

Categories
Nevin Manimala Statistics

Psychiatric Disorders Among Fathers in Sweden Before, During, and After Partner Pregnancy

JAMA Netw Open. 2026 Mar 2;9(3):e262725. doi: 10.1001/jamanetworkopen.2026.2725.

ABSTRACT

IMPORTANCE: Paternal psychiatric disorders during the perinatal period can affect the health of the entire family; however, these conditions have often been underrecognized, and little is known about their incidence and timing of onset.

OBJECTIVE: To investigate incidence patterns of new-onset diagnosed psychiatric disorders among men in Sweden before, during, and after a partner’s pregnancy.

DESIGN, SETTING, AND PARTICIPANTS: This prospective cohort study used linked national register data for all fathers of children born in Sweden between January 1, 2003, and December 31, 2021, with follow-up from 1 year before to 1 year after pregnancy. Data were analyzed from October 1, 2024, to March 31, 2025.

EXPOSURES: The time during pregnancy and 1 year after childbirth (post partum) were considered the risk periods, while 1 year before pregnancy (before conception) was used as the reference period.

MAIN OUTCOMES AND MEASURES: Annual and weekly incidence rates (IRs) of clinical diagnoses of any psychiatric disorder and 9 type-specific disorders were calculated and standardized by age and calendar year. Adjusted Poisson regression analysis was used to further estimate incidence rate ratios (IRRs) of psychiatric disorders during and after pregnancy compared with before conception.

RESULTS: This study included 1 915 722 births from 1 096 198 fathers (mean [SD] age at childbirth, 33.8 [6.2] years) in Sweden. IRs of any diagnosed psychiatric disorder were lower during pregnancy (eg, pregnancy week 1: IR, 5.50 [95% CI, 4.69-6.31] per 1000 person-years) and the early postpartum period (eg, postpartum week 1: IR, 5.19 [95% CI, 4.41-5.97] per 1000 person-years) than in the corresponding preconception weeks (eg, preconception week 1: IR, 7.00 [95% CI, 5.97-8.04] per 1000 person-years); they returned to comparable rates later post partum. This pattern was also observed for IRRs of anxiety, alcohol use, and drug use (ie, the use of nonalcohol, nontobacco psychoactive drugs) disorders. IRRs of depression (eg, postpartum weeks 45-49: IRR, 1.30 [95% CI, 1.12-1.52]) and stress-related disorders (eg, postpartum weeks 45-49: IRR, 1.36 [95% CI, 1.15-1.61]), however, showed a notable 30% increase toward the end of the first postpartum year. In contrast, IRRs of diagnosis of tobacco use disorder, attention-deficit/hyperactivity disorder, bipolar disorder, or psychosis remained relatively stable before, during, and after pregnancy.

CONCLUSIONS AND RELEVANCE: In this nationwide cohort study, fathers in Sweden were less likely to be diagnosed with a psychiatric disorder during a partner’s pregnancy and early post partum than before conception, but IRs returned to comparable levels thereafter. These incidence patterns may reflect transient protection and delayed detection during the transition to fatherhood and support the need for paternal mental health surveillance, particularly for increased depression and stress-related disorders in the late postpartum period.

PMID:41870430 | DOI:10.1001/jamanetworkopen.2026.2725

Categories
Nevin Manimala Statistics

Alignment of Large Language Model Responses With Human Therapists in Motivational Interviewing

JAMA Netw Open. 2026 Mar 2;9(3):e262750. doi: 10.1001/jamanetworkopen.2026.2750.

ABSTRACT

IMPORTANCE: Large language models (LLMs) are increasingly applied to mental health contexts, yet their capacity to generate responses that align with evidence-based psychotherapy remains uncertain. Motivational interviewing (MI), a structured counseling approach, provides an empirically grounded setting for evaluating alignment between LLM-generated and human therapist responses.

OBJECTIVE: To evaluate how closely an LLM’s responses align with therapist responses in MI sessions, using automated similarity metrics.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study used high-fidelity therapist-client transcripts annotated with the Motivational Interviewing Treatment Integrity system. Transcripts were sourced from publicly available counseling videos. For each therapist turn, the GPT-4o LLM generated a response using a standardized, MI-informed prompt based on the preceding conversation context. Analyses were conducted between March and May 2025.

MAIN OUTCOMES AND MEASURES: Alignment between LLM-generated and therapist responses was assessed using (1) cosine similarity based on sentence embeddings to capture semantic overlap and (2) DeepEval, a contextual deep-learning-based metric assessing coherence and contextual appropriateness. A therapist topic-consistency index quantified within-session thematic coherence and was examined as a moderator of alignment.

RESULTS: A total of 3706 therapist turns from 154 MI sessions were evaluated. Mean (SD) DeepEval scores were higher than mean (SD) cosine similarity scores (0.72 [0.31] vs 0.29 [0.20]; P < .001), suggesting limited semantic overlap despite greater contextual appropriateness. Therapist topic consistency significantly moderated similarity, where cosine similarity was higher in high-consistency than low-consistency sessions (mean [SD] difference, 0.027 [0.007]; t3706 = 3.987; P < .001), as was DeepEval score (mean [SD] difference, 0.038 [0.010]; t3706 = 3.747; P < .001). Correlation between metrics was negligible (Spearman ρ, -0.01), indicating that they captured distinct aspects of response alignment. LLM performance declined slightly across longer conversations (mean [SD] slope reduction for cosine similarity, -0.0005 [0.0016], and for DeepEval, -0.0005 [0.0022]), with increased verbosity and signs of reduced contextual grounding.

CONCLUSIONS AND RELEVANCE: In this cross-sectional study of 154 MI sessions, prompted LLMs showed general alignment with therapist responses in MI-oriented conversations, as judged by automated similarity metrics. However, limitations in long-range coherence, stylistic alignment, and the use of indirect proxies for therapeutic quality highlight the need for improved prompt design, MI-specific evaluation methods, and clinical validation before integration into mental health care.

PMID:41870428 | DOI:10.1001/jamanetworkopen.2026.2750