Can Information Representations Inspired by the Human Auditory Perception Benefit Computer Audition-based Disease Detection? An Interpretable Comparative Study

IEEE J Biomed Health Inform. 2025 Nov 28;PP. doi: 10.1109/JBHI.2025.3638846. Online ahead of print.

ABSTRACT

Computer audition-based methods have attracted a great deal of attention in the field of disease detection due to their significant advantages, e.g., non invasive and convenient operation. Among them, the introduction of information representations inspired by human auditory perception, e.g., Mel-frequency transformation, gives it great potential to approach and even exceed the limits of the human auditory system. However, according to previous research, it remains challenging to fairly as sess whether information representations inspired by human auditory perception have a significant positive effect on disease detection. Moreover, performance differences among various information representations and their underlying causes are yet to be thoroughly investigated and analyzed. To this end, we propose an interpretable com parative study on information representations inspired by human auditory perception for disease detection. First, the detection accuracy of different information representations are investigated on two sound datasets (a psychological and a physiological disease) based on the classical model and the proposed Temporal-Spatial Multi-Scale Perception Network. Then, the noise robustness of these information representations are compared by introducing Gaussian noise with varying signal-to-noise ratios (SNRs). Finally, by combining the human auditory perception mechanism and explainable AI techniques, we analyze the reasons for performance differences among various information representations from qualitative and quantitative perspectives. Experimental results demonstrate that information representations inspired by human auditory perception can improve the performance of disease detection with statistical significance. Furthermore, Gammatone Frequency Cepstral Coefficients (GFCCs) outperform other information representations by achieving the highest accuracy, particularly under noisy conditions. The interpretable results further reveal the underlying reasons for GFCC’s superior performance, highlighting its ability to capture critical auditory features robustly across varying noise levels. These findings emphasize the potential of auditory perception inspired representations in advancing computer audition based disease detection systems and provide a solid foundation for future research in this domain.

PMID:41313695 | DOI:10.1109/JBHI.2025.3638846

By Nevin Manimala