Categories
Nevin Manimala Statistics

Large language models in systematic review and meta-analysis of surgical treatments for vaginal vault prolapse

NPJ Digit Med. 2026 Feb 19. doi: 10.1038/s41746-026-02431-w. Online ahead of print.

ABSTRACT

Systematic reviews provide the highest level of evidence but remain resource-intensive. We evaluated the performance of a large language model (LLM; ChatGPT, OpenAI) in a PRISMA-guided review of randomized controlled trials on vaginal vault prolapse surgery. Prompts were carefully designed to minimize errors, and outputs were verified. Each task was completed within minutes. For title/abstract screening, recall was 69.8% and precision 85.7% (κ = 0.77); full-text agreement 94.1-100% (κ = 0.82-1); data extraction accuracy 87.5-99.7%. From 18 RCTs (1668 women), sacrocolpopexy (SC) showed higher anatomic success than sacrospinous fixation (SSF) (OR 1.42, 95% CI 0.71-2.84). Transvaginal mesh improved 3-year objective success compared with SSF (OR 1.84, 95% CI 1.13-2.99) but had higher reoperation rates (5-16% vs 2-4%) than SC. We did not find conclusive evidence that any single technique is superior; most comparisons were underpowered, with wide confidence intervals and substantial heterogeneity. All LLM-derived statistical results were identical to those from conventional R analyses, confirming robustness. Validated LLM workflows can enable more efficient and scalable evidence synthesis.

PMID:41714807 | DOI:10.1038/s41746-026-02431-w

By Nevin Manimala

Portfolio Website for Nevin Manimala