Categories
Nevin Manimala Statistics

Using LLM-generated tools to extract information about reporting statistical software in biomedical and health science research articles

BMC Res Notes. 2026 Jun 27. doi: 10.1186/s13104-026-07908-1. Online ahead of print.

ABSTRACT

OBJECTIVE: A major problem with reviewing the statistical methodology in published medical articles is that extracting the necessary details from large sample sets is time-consuming. This paper demonstrates how a novel automated procedure can extract information about statistical reporting from literature. To illustrate this, we searched the PubMed Central database for original research articles published in 2021 and 2023 to identify the statistical software packages used for data analysis. A key element in terms of transparency and reproducibility is the reporting of the software used for statistical analysis.

RESULTS: A freely available Shiny App was created with the help of generative artificial intelligence, and it was used to retrieve automatically information from randomly selected samples of articles indexed in PubMed Central. We analyzed a large sample of articles (n = 1740) to determine the reporting of statistical software for nine study designs. We found that, across different study types, proprietary software such as IBM SPSS Statistics still dominates. Despite multiple calls for greater use of open-source research software, these programs are not used as frequently. In addition, a surprising number of articles did not report the software used. Furthermore, this is the first application of the recent Vibe Coding concept to statistical research methods.

PMID:42365353 | DOI:10.1186/s13104-026-07908-1

By Nevin Manimala

Portfolio Website for Nevin Manimala