Med Phys. 2026 Feb;53(2):e70258. doi: 10.1002/mp.70258.
ABSTRACT
BACKGROUND: Radiotherapy treatment planning (TP) aims to maximize radiation dose delivered to tumors while minimizing exposure to surrounding healthy tissues. Beam angle optimization (BAO) is a crucial component of TP, characterized by high dimensionality and non-convexity, and is traditionally solved via heuristic or manual iterative approaches. These conventional methods are time-consuming and often yield suboptimal solutions due to incomplete exploration of the vast solution space.
PURPOSE: This study introduces a novel framework integrating a general-purpose large language model (LLM) within a reinforcementlearning (RL)-inspired iterative strategy to automate BAO in radiotherapy planning. Taking advantage of the inherent knowledge embedded in LLMs, the method uses visual and scalar feedback to produce clinically meaningful treatment plans without requiring any domain-specific fine-tuning or additional training.
METHODS: The proposed framework employs an off-the-shelf Generative Pre-trained Transformer, GPT-4 model (denoted GPT-4o) in an inference-only setting. At each iteration, GPT-4o suggests a set of gantry angles, which are subsequently input into the MatRad software to generate a dose distribution. A scalar reward is computed from this distribution using a custom reward function designed to balance target dose conformity and sparing of organs-at-risk (OARs). This reward, along with the corresponding dose maps, serves as feedback for the LLM to iteratively refine its suggestions. The refinement process consists of distinct exploration and exploitation phases inspired by classical RL paradigms. We evaluated six configurations that varied in exploration duration and in the Computed Tomography (CT) slice inputs provided to the LLM (Single-View vs. Multi-View). Performance was benchmarked against a random-angle selection baseline across three anatomical sites: prostate, head-and-neck, and liver.
RESULTS: Across the liver and head-and-neck cases, all LLM-based configurations significantly outperformed the random baseline ( ). In the prostate scenario, most strategies demonstrated statistically significant improvements, except for the Multi-View configurations with extended exploration phases (10 and 15 iterations). Rewards consistently increased during the exploitation phase, and the resulting dose-volume histograms and dose distributions exhibited improved conformity to target volumes with enhanced sparing of OARs. Notably, plans of clinically plausible quality were obtained within 20 iterative refinement steps in this proof-of-concept setting.
CONCLUSIONS: This study demonstrates that general-purpose LLMs, operating without specialized model training or fine-tuning, can effectively serve as intelligent agents for automated radiotherapy TP, specifically addressing the BAO problem. This flexible and scalable framework has the potential to enhance clinical decision-making workflows in radiotherapy. Future research directions include exploring more comprehensive and clinically nuanced reward functions and extending the methodology to other components of radiotherapy TP.
PMID:41612144 | DOI:10.1002/mp.70258