The current state of demographic subgroup reporting for commercially available AI for radiology: a scoping review

Eur Radiol. 2026 Jun 12. doi: 10.1007/s00330-026-12652-y. Online ahead of print.

ABSTRACT

OBJECTIVE: Though subgroup performance reporting helps ensure the safety of artificial intelligence (AI) products, the extent of this reporting remains unclear. This scoping review identifies studies validating commercially available AI-based products and reports the trends in performance reporting across sex, age, and race/ethnicity demographic subgroups.

MATERIALS AND METHODS: Peer-reviewed validation studies of commercially available products published after 2010 were collected from the Health AI Register and PubMed on 29 November 2024. Study trends in the reporting of sex, age, and race/ethnicity were mapped with regression analysis. We apply the Wilson confidence interval equation to estimate which tuberculosis detection studies are underpowered for subgroup meta-analysis.

RESULTS: Three hundred ninety-two of 545 studies validating 252 products reported subgroup demographic data for any of the three groups. Only 77 of these presented subgroup performance results. Skeletal (20/88) and lung (30/139) studies, and those utilizing chest (24/79) or bone (19/63) radiographs, most often presented subgroup performance data. We found no evidence that more recent studies (OR: 1.039 [95% CI: 0.959-1.127]) or company sponsorship (OR: 1.010 [95% CI: 0.492-1.920]) led to increased subgroup reporting. We show that 14/21 tuberculosis datasets may be underpowered for post-hoc subgroup meta-analysis.

CONCLUSION: This scoping review quantifies how fragmented the commercial validation landscape is, showing that reporting for both the demographics and per-subgroup performance is inadequate for estimating subgroup bias. This systemic problem requires effort from all stakeholders, from researchers to regulatory agencies, encouraging thorough reporting and commercial product validation to support physician and patient trust in medical AI products.

KEY POINTS: Question The number of studies validating the performance of each commercially available radiology AI product for minority subgroup bias is unclear. Findings The currently available commercial AI validation studies often neglect to describe demographic subgroup data, and fewer provide performance results per subgroup, prohibiting algorithmic bias meta-analysis. Clinical relevance Physician and patient trust in the medical AI already used clinically must be built on peer-reviewed literature and meta-analysis. The current literature is insufficient for determining the safety and performance of these products for demographic minorities.

PMID:42286177 | DOI:10.1007/s00330-026-12652-y

By Nevin Manimala