Large Language Models Using Clinical Text in Pediatrics: A Scoping Review

JAMA Netw Open. 2026 Mar 2;9(3):e262443. doi: 10.1001/jamanetworkopen.2026.2443.

ABSTRACT

IMPORTANCE: Large language models (LLMs) are increasingly being applied to analyze clinical data, primarily clinical text, with an increasing emphasis on integration in health care. However, the use of LLMs in pediatric care remains underexplored.

OBJECTIVE: To map the emerging literature on LLM use in pediatrics involving clinical text and identify evidence gaps and future directions for implementation and evaluation.

EVIDENCE REVIEW: PubMed/MEDLINE, Embase, Web of Science, Scopus, and preprint servers were searched for English-language original research published from January 1, 2020, to July 1, 2025. Included studies used modern transformer-based LLMs with pediatric clinical text as input. Two reviewers independently screened studies using predefined criteria. Data were extracted by one reviewer and verified by another. Findings were descriptively synthesized, and adherence to the Minimum Information for Medical AI Reporting (MINIMAR) standards was assessed.

FINDINGS: The review included 40 studies published between 2023 and 2025. Twenty-three studies were conducted in the US, and all were retrospective observational studies using clinical data from sources such as electronic health records. Participant sample sizes ranged from 10 to 172 683. Although all pediatric age subgroups were represented, early childhood populations (aged 0-5 years) were underrepresented. The most common LLM clinical applications were diagnostic decision support in 24 studies (60.0%) and treatment planning in 7 studies (17.5%). Although all 40 studies conducted clinical evaluation of LLMs and 30 included discussions of ethics or data privacy, 39 studies (97.5%) did not meet full MINIMAR standards, 34 (85.0%) did not report use of Health Insurance Portability and Accountability Act-compliant models, and 30 (75.0%) lacked fine-tuning for pediatric-specific data. Among 33 studies assessing model performance against human annotations, 10 (30.3%) did not include clinicians as annotators; among 26 studies with multiple annotators, only 9 (34.6%) reported interannotator agreement statistics.

CONCLUSIONS AND RELEVANCE: This scoping review found that diagnostic decision support and treatment planning were commonly proposed applications of LLMs in pediatrics. However, gaps in scientific rigor and limited use of pediatric-specific data may hinder their safe and effective implementation in pediatrics. Future studies should use standardized evaluation and reporting methods, increase clinician involvement, and expand research to underrepresented ages and clinical applications.

PMID:41879783 | DOI:10.1001/jamanetworkopen.2026.2443

By Nevin Manimala