BMJ Health Care Inform. 2026 Mar 23;33(1):e101877. doi: 10.1136/bmjhci-2025-101877.
ABSTRACT
We present BODHI (Balanced, Open-minded, Diagnostic, Humble, and Inquisitive), an engineering framework for curiosity driven and humble clinical decision support artificial intelligence (AI) systems. Despite growing capabilities, large language models (LLMs) often express inappropriate confidence, conflating statistical pattern recognition with genuine medical understanding. BODHI addresses this through a dual reflective architecture that: (1) decomposes epistemic uncertainty into task specific dimensions, and (2) constrains model responses using virtue based stance rules derived from a Virtue Activation Matrix. We validate the framework through controlled evaluation on 200 clinical vignettes from HealthBench Hard, assessing GPT-4o-mini and GPT-4.1-mini across 5 random seeds (2000 total observations). Statistical analysis included bootstrap resampling, paired t tests, and effect size computation. BODHI improved overall clinical response quality (GPT-4.1-mini: +16.6 pp, p<0.0001, Cohen’s d=11.56; GPT-4o-mini: +2.2 pp, p<0.0001, Cohen’s d=1.56) and achieved very large effect sizes on curiosity (context seeking rate: Cohen’s d=16.38 and 19.54) and humility (hedging: d=5.80 for GPT-4.1-mini) metrics. Crucially, 97.3% of GPT-4.1-mini responses and 73.5% of GPT-4o-mini responses included appropriate clarifying questions, compared with 7.8% and 0.0% at baseline, demonstrating the framework’s effectiveness in eliciting information gathering behaviour. Findings suggest LLMs can be reliably constrained to operate within epistemic boundaries when provided with structured uncertainty decomposition and virtue aligned response rules, offering a pathway towards safer clinical AI deployment.
PMID:41871866 | DOI:10.1136/bmjhci-2025-101877