Detection of Non-diabetic Kidney Disease in Patients with Diabetes Using Machine Learning and Electronic Medical Record Data

Kidney Blood Press Res. 2026 Mar 18:1-21. doi: 10.1159/000551589. Online ahead of print.

ABSTRACT

INTRODUCTION: The identification of non-diabetic kidney disease (NDKD) in diabetic patients is critically important. Unlike diabetic nephropathy, NDKD often requires additional therapeutic interventions beyond standard diabetes care. There is a need to develop computational methods using electronic medical record data to identify NDKD in diabetic patients for whom kidney biopsy is not an option.

METHODS: The study included 1136 diabetic patients who underwent kidney biopsy at a tertiary teaching hospital. We collected 103 parameters from electronic medical records, including demographic characteristics, physical examination results, laboratory tests, and the status of diabetic retinopathy. We developed seven models to detect NDKD, including k-nearest neighbors, random forest, extreme gradient boosting (XGB), lasso Logistic regression, support vector machine, naïve bayes, and multilayer perceptron (MLP), in the training set (n=908), and compared their performances in the testing set (n=228). The SHapley Additive exPlanations (SHAP) approach was used to analyze the importance of features.

RESULTS: Biopsy-confirmed NDKD was present in 53% of the 1136 participants. In the testing set, the area under the receiver operating characteristic curve (AUC) for NDKD detection using XGB, Lasso regression, and MLP reached 0.8, with performances that were stable regardless of whether variable normalization was performed. Among them, XGB revealed the highest AUC (0.833; 95% CI: 0.800 to 0.864) without feature normalization, which was statistically superior to the other models according to DeLong’s tests. After feature normalization, SVM achieved the highest AUC of 0.841 (95% CI: 0.817to 0.861) among all models. In addition to established predictive factors for NDKD (e.g., hematuria and absence of diabetic retinopathy), SHAP analysis identified several features, such as low IgG levels, that contributed significantly to the differentiation models.

CONCLUSION: Despite performance variations in different modeling techniques, machine learning models may have the potential to facilitate the detection of NDKD for patients with contraindications for kidney biopsy. Further efforts are warranted to improve accuracy and facilitate their translation into clinical practice.

PMID:41849636 | DOI:10.1159/000551589

By Nevin Manimala