Authors: Guannan Gong, PhD; Satrajit Roychoudhury, PhD; Allison Meisner, MD; Lajos Pusztai, MD, DPhil; Sarah B. Goldberg, MD, MPH; Wei Wei, PhD
Designing modern oncology clinical trials requires synthesizing evidence from prior studies to inform hypothesis generation, effect-size estimation, and sample size determination. In practice, this process relies heavily on qualitative summaries or aggregate statistics that incompletely capture heterogeneity across patient populations and study designs. As a result, trials may be based on misspecified assumptions, leading to underpowered studies or misleading conclusions.
To address this challenge, we introduce LEAD-ONC (Literature to Evidence for Analytics and Design in Oncology)— an AI-assisted framework that transforms published oncology trial reports into quantitative, design-relevant evidence. LEAD-ONC integrates advances in large language models (LLMs), survival data reconstruction, and Bayesian hierarchical modeling to enable principled learning from prior trials and prospective trial design under uncertainty.
Given a set of expert-curated clinical trial publications meeting predefined eligibility criteria, LEAD-ONC operates in three stages:
This framework enables probabilistic projections of treatment effects, explicitly accounting for between-trial heterogeneity and uncertainty—key elements often overlooked in traditional trial planning.
We demonstrate LEAD-ONC using five phase III trials in first-line non–small-cell lung cancer evaluating PD-1 or PD-L1 inhibitors with or without CTLA-4 blockade. Clustering based on extracted baseline characteristics identified three clinically interpretable populations defined by histology.
For a hypothetical prospective randomized trial in a mixed-histology population comparing mono versus dual immune checkpoint inhibition, LEAD-ONC projected:
Because LEAD-ONC remains under active development, these findings are intended as methodological demonstrations rather than definitive clinical guidance.
LEAD-ONC illustrates how AI-driven extraction from the biomedical literature, combined with principled statistical modeling, can support evidence-driven oncology trial design. The framework provides a foundation for improving hypothesis formulation, power calculations, and decision-making under uncertainty, with potential applications across disease areas and therapeutic modalities.
Gong G, Roychoudhury S, Meisner A, Pusztai L, Goldberg SB, Wei W.
Learning from Literature: Integrating LLMs and Bayesian Hierarchical Modeling for Oncology Trial Design.
arXiv preprint arXiv:2602.08172, 2026.
https://doi.org/10.48550/arXiv.2602.08172
@article{gong2026leadonc,
title={Learning from Literature: Integrating LLMs and Bayesian Hierarchical Modeling for Oncology Trial Design},
author={Gong, Guannan and Roychoudhury, Satrajit and Meisner, Allison and Pusztai, Lajos and Goldberg, Sarah B and Wei, Wei},
journal={arXiv preprint arXiv:2602.08172},
year={2026},
doi={10.48550/arXiv.2602.08172}
}
LEAD-ONC is an AI-assisted framework that transforms published oncology trial reports into quantitative, design-ready evidence for survival analysis and trial planning.
Current App Version: v1.2.1