How is Bias Learned in Medical Image Analysis Models? An Exploration of the Encoding of Demographic Information in Deep Learning Models Trained to Detect Abnormalities on Chest X-Rays

J Imaging Inform Med. 2026 Jun 24. doi: 10.1007/s10278-026-02073-0. Online ahead of print.

ABSTRACT

Deep learning models achieve strong diagnostic performance in medical imaging, yet often exhibit systematic performance disparities across demographic subgroups. Although prior work has shown that attributes such as age, sex and race are encoded within internal representations, it remains unclear how the structure of these representations contributes to subgroup-level differences in prediction behaviour. This study aims to examine how demographic information is embedded in chest X-ray classifiers and how latent-space structure relates to observed sensitivity disparities. We analysed two large-scale chest X-ray datasets, CheXpert and MIMIC-CXR, using DenseNet-121 models trained for multi-label disease classification. In addition to standard output-level evaluation, we conducted representation-level analyses using linear probes, embedding statistics and geometric measures to characterise subgroup differences in activation strength, latent-space proximity and model confidence. Disparities were assessed across age, race and sex by jointly examining feature encodings, logits, energy scores and true-positive rates. Demographic attributes showed limited direct association with disease labels and low standalone predictive utility, yet were strongly encoded within internal features. Younger and Black/African American patients consistently exhibited higher feature norms, greater separation in latent space and lower joint logit energy, despite comparable overall discrimination performance. These representational patterns persisted after accounting for label configuration and were associated with larger sensitivity gaps, consistent with structural suppression in which certain subgroups occupy sparser, lower-activation regions of the representation space. Sex-based differences were comparatively modest across representational and performance metrics. Subgroup disparities in chest X-ray classification are closely linked to how demographic groups are positioned and activated within latent space, rather than to directional misalignment alone. Representation-level diagnostics based on activation magnitude, density and energy provide mechanistic insight into model behaviour and highlight limitations of mitigation strategies that focus solely on feature removal or post hoc thresholding. These findings support the use of representation-level analysis as a principled component of fairness evaluation and mitigation design in clinical AI systems.

PMID:42343005 | DOI:10.1007/s10278-026-02073-0

By Nevin Manimala