Current progress and obstacles for automated classification of causes of death based on death certificates: A systematic review

Int J Med Inform. 2026 Jun 19;219:106549. doi: 10.1016/j.ijmedinf.2026.106549. Online ahead of print.

ABSTRACT

OBJECTIVE: While manual coding or rule-based software are approaches used by most countries for cause of death classification, the application of advanced deep learning tools is likely to enhance the efficiency and accuracy of national mortality statistics. To systematically review the current implementation of automated coding or categorising tools for cause of death classification, summarising the methodologies applied, performance achieved, and the progress and obstacles for application.

METHODS: PubMed and Scopus were systematically searched from 2018 to 2024 to identify studies that used automated tools to code or categorise the causes of death. Two researchers independently selected the papers with disagreement adjudicated by a third supervisor. For each study, the general profile, detailed methodology, and performance of the tools were extracted with progress and potential obstacles for implementation assessed qualitatively. An adapted version of QUADAS-2 was used to assess the risk of bias.

RESULTS: Among the 46 included studies, the training sample size ranged from 165 to 10,519,268 people. The most popular approaches used were deep learning (n = 22, of which 7 were recurrent neural network and 6 were transformer) and rule-based (n = 15) automation. Large disparities existed in the performance, with recall (sensitivity) ranging from 0.253 to 1.000 and precision (positive predictive value) ranging from 0.396 to 1.000. Precision was often higher than recall and could vary substantially for different settings within the same study. Quality of text was the major obstacle to implementation of older automated tools, while for deep learning models, target task and materials were required for pre-training. The performance of deep learning was unsatisfactory for infrequent causes of death and head-to-head comparisons of performance with rule-based tools were limited.

CONCLUSION: Despite deep learning applications gaining popularity over rule-based tools, their performance is inconsistent and evidence of head-to-head comparisons is insufficient. All approaches are influenced by the quality of the training data.

PMID:42320083 | DOI:10.1016/j.ijmedinf.2026.106549

By Nevin Manimala