J Assist Reprod Genet. 2026 Feb 21. doi: 10.1007/s10815-026-03833-1. Online ahead of print.
ABSTRACT
PURPOSE: This study addressed the practical challenge of missing data in assisted reproductive technology by evaluating the reliability of predicting oocyte yield when anti-Müllerian hormone (AMH) values are unavailable. We examined the ability of AI-based models to recover missing biomarker data and maintain predictive accuracy despite data limitations.
METHOD: We conducted a retrospective analysis using data from 27,435 IVF cycles across multiple centers from 2018 to 2023. Various machine learning models were compared to serve as internal imputation models to fill data gaps and predict oocyte retrieval. We validated the models across a range of missingness rates (0% to 90%) using bootstrapping to ensure statistical robustness and evaluate generalizability across different clinical environments.
RESULTS: The best-performing model using actual AMH achieved an AUC of 0.838. Despite the relatively low explained variance in AMH recovery (R2 ≈ 0.2), the imputed values captured enough clinical information to serve as reliable predictive proxies. The model’s performance remained above an illustrative benchmark of 0.80 AUC until the missing rate reached 35.5%. SHAP analysis confirmed that the AI model effectively used age and other clinical variables to compensate for missing AMH data.
CONCLUSIONS: AI-based imputation offers a practical solution for clinical infertility care, where missing data is caused by documentation issues or repeat-cycle workflows. This approach bridges the gap between ideal laboratory records and realistic data limitations, ensuring that data-driven decision support remains accessible even in the presence of incomplete records.
PMID:41723323 | DOI:10.1007/s10815-026-03833-1