Poster Sessions

List of Poster Sessions:

Poster Sessions:

Title: Communicating Complex Considerations in Dual Endpoint Trial Design – An Oncology Case Study
Authors: Boaz Adler, MPA, Valeria Mazzanti, MPH, and Pantelis Vlachos, PhD
Institution: Cytel Inc.
Presenter and email: Boaz Adler boaz.adler@cytel.com
Abstract:

In this case study, we describe the challenges faced in the design and selection of a dual endpoint clinical trial in an Oncology indication. We highlight benefits and limitations of selecting a dual- versus single-endpoint design, and the discussion of tradeoffs with a cross-functional study team. Furthermore, the case study highlights the value of adding an efficacy stopping boundary at an interim analysis, as well as a second, later interim analysis both leading to benefits such as savings in average sample size and average study duration. Finally, the case study also shows how extensive simulation work using advanced software in study design supports more realistic expectation for study power by incorporating discrete prior probabilities for a range of treatment effects.


Title: A Bayesian Estimator of Sample Size
Authors: Dehua Bi, Yuan Ji
Institution: Stanford University
Presenter and email: Dehua Bi dehuabi@stanford.edu
Abstract:

We consider a Bayesian framework for estimating the sample size of a clinical trial. The new approach, called BESS, is built upon three pillars, Sample size of the experiment, Evidence from the observed data, and Confidence of the final decision. It uses a simple logic of “given the evidence from data, a specific sample size can achieve a degree of confidence in making a trial decision.” The key distinction between BESS and standard sample size estimation (SSE) is that SSE, typically based on Frequentist inference, specifies the true parameters values in its calculation to achieve properties under repeated sampling while BESS assumes possible outcome from the observed data to achieve high posterior probabilities for decision making. As a result, the calibration of the sample size is directly based on the probability of making a correct decision rather than type I or type II error rates. We demonstrate that BESS leads to a more interpretable statement for investigators and can easily accommodates prior information as well as sample size re-estimation. We explore its performance in comparison to the standard SSE and demonstrate its usage through a case study of oncology optimization trial. An R tool is available at https://ccte.uchicago.edu/BESS.


Title: Balancing the effective sample size in prior across different doses in the curve-free Bayesian decision-theoretic design for dose-finding trials
Authors: Jiapeng Xu, Dehua Bi, Shenghua Kelly Fan, Bee Leng Lee, Ying Lu
Institution: Stanford University
Presenter and email: Dehua Bi dehuabi@stanford.edu
Abstract:

The primary goal of dose allocation in phase I trials is to minimize patient exposure to subtherapeutic or excessively toxic doses, while accurately recommending a phase II dose that is as close as possible to the maximum tolerated dose (MTD). Fan et al. (2012) introduced a curve-free Bayesian decision-theoretic design (CFBD), which leverages the assumption of a monotonic dose-toxicity relationship without directly modeling dose-toxicity curves. This approach has also been extended to drug combinations for determining the MTD (Lee et al., 2017). Although CFBD has demonstrated improved trial efficiency by using fewer patients while maintaining high accuracy in identifying the MTD, it may artificially inflate the effective sample sizes for the updated prior distributions, particularly at the lowest and highest dose levels. This can lead to either overshooting or undershooting the target dose. In this paper, we propose a modification to CFBD’s prior distribution updates that balances effective sample sizes across different doses. Simulation results show that with the modified prior specification, CFBD achieves a more focused dose allocation at the MTD and offers more precise dose recommendations with fewer patients on average. It also demonstrates robustness to other well-known dose finding designs in literature.


Title: Tools for Randomized Clinical Trials Using Restricted Mean Survival Time and Average Hazard
Authors: Miki Horiguchi, Hajime Uno
Institution: Dana-Farber Cancer Institute
Presenter and email: Miki Horiguchi Miki_Horiguchi@dfci.harvard.edu
Abstract:

In randomized clinical trials with time-to-event outcomes, the log-rank test based on Cox's proportional hazards model is commonly used for statistical comparisons, with the hazard ratio reported as the summary measure of treatment effect. However, the limitations of this traditional approach have been widely discussed. Alternative methods, such as restricted mean survival time (RMST) and average hazard with survival weight (AH), are gaining attention to address the limitations and providing more robust and interpretable quantitative information on treatment effects. While they have received attention, practical considerations for trial design using RMST or AH, particularly in determining analysis timing, remain understudied. We aim to fill these gaps by presenting methodological considerations and tools for identifying analysis timing, aiming to facilitate broader adoption of these alternative methods in practice.


Title: Pharmacometrics-Enhanced Causal Inference: Accounting for dose modifications in exposure-response analyses in oncology for brigimadlin development
Authors: Matthew Wiens, Jia Kang, Kyle Baron, James Rogers, Steve Choy, Girish Jayadeva, Alejandro Perez Pitarch, David Busse
Institution: Metrum Research Group
Presenter and email: Matthew Wiens mattheww@metrumrg.com
Abstract:
Background and Objectives:

Model-based exposure-response (ER) analyses are a cornerstone of dose optimization in the Project Optimus era in oncology drug development yet often do not directly address the causal questions of clinical interest. Dose modifications due to safety and tolerability lead to feedback in the dose-exposure-safety relationship, where safety outcomes and doses have time-varying confounding. Failure to account for this feedback in standard model-based ER analyses may lead to unrealistic simulations (i.e., too high of exposure and safety risks), reducing the credibility of model-based inferences. Semi-mechanistic pharmacometric models are an important tool for model informed drug development and Project Optimus but typically have not been evaluated with the perspective of formal causal analyses. Based on the case example of safety-based dose modifications of brigimadlin, a potent, oral murine double minute 2 homolog-tumor protein 53 antagonist, we aimed to:

  • Characterize the relationship between safety endpoints and dose modifications
  • Perform dynamic simulations of exposure and safety that account for dose modifications
  • Support causal inferences for hypothetical dosing regimens
Methods:

A Bayesian model of the probability of dose modification as a function of platelet and neutrophil counts was developed to enable dynamic and probabilistic dosing decisions. This model was a composite of a categorical model for the dosing decision and a time-to-event model for the length of the dose delay. The dose decision model used four categories: (i) no dose change and no delay, (ii) a delay with no change, (iii) a dose reduction without delay, (iv) and a dose reduction with delay. Subsequently, a dynamic simulation was conducted using mrgsolve [1] which simulated the loop of dose to exposure with a PK model, exposure to platelets and neutrophils with PKPD models, and platelets and neutrophils to dose decision using the dose modification model. The g-formula was applied with simulations for time-varying treatments to estimate the relationship between initial dose and safety in the presence of intercurrent events [2].

Results:

Dose delays and reductions were estimated to happen more frequently with lower platelet and neutrophil counts; additionally, the delays were longer with lower counts. Simulated patient profiles with dynamic dosing regimens adequately captured the qualitative trajectories of dose decisions, exposure, platelet, and neutrophil counts. Under the observed initial dose, the predicted rate of grade 3+ thrombocytopenia was

21.6%, compared to the observed rate of 24.6%, while for grade 3+ neutropenia the predicted rate was 13.8% compared to the observed rate 17.5%. When not accounting for the dose modifications, the risk of thrombocytopenia Grade 3+ was overpredicted (38.5%).

Conclusions:

A dose modification model was successfully integrated into a dynamic simulation framework accounting for the impact of safety signals on dose. This framework was able to adequately predict the observed safety outcomes and may serve as a basis to support realistic simulations in other oncology drug development programs.

Citations:

[1] Baron KT et. al. mrgsolve: Simulate from ODE-Based Models [Internet]. Metrum Research Group; 2021. Available from: https://cran.r-project.org/package=mrgsolve

[2] Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.


Title: The impact of within-cluster correlation in clinical trials
Authors: Shangyuan Ye, Byung Park
Institution: Knight Cancer Institute, OHSU
Presenter and email: Shangyuan Ye yesh@ohsu.edu
Abstract:

Outcomes in clinical trials are often correlated. For example, in vaccine studies, this correlation can arise when a virus spreads among individuals within the same group. Similarly, in multicenter clinical trials, patients treated at the same center may have similar results. Although the impact of clustering has been thoroughly studied in cluster-randomized trials, its influence on other designs, such as individual randomization and within-cluster randomization, is rarely addressed. In this study, motivated by vaccine trial designs, we evaluate the performance of generalized estimating equation (GEE) estimators and their robust sandwich variance estimators under different trial designs in the presence of within-cluster correlation. Specifically, we focus on testing the vaccine's effectiveness in reducing an individual's infection probability. Our findings indicate that clustering effects should always be accounted for, regardless of the type of study design.


Title: Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale
Authors: Cliff Wong, Sam Preston, Qianchu Liu, Hoifung Poon
Institution: Microsoft
Presenter and email: Qianchu (Flora) Liu qianchuliu@microsoft.com
Abstract:

The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper, we propose UniMedAbstractor (UMA), a zero-shot medical abstraction framework leveraging Large Language Models (LLMs) through a modular and customizable prompt template. We refer to our approach as universal abstraction as it can quickly scale to new attributes through its universal prompt template without curating attribute-specific training labels or rules. We evaluate UMA for oncology applications, focusing on fifteen key attributes representing the cancer patient journey, from short-context attributes (e.g., performance status, treatment) to complex long-context attributes requiring longitudinal reasoning (e.g., tumor site, histology, TNM staging). Experiments on real-world data show UMA's strong performance and generalizability. Compared to supervised and heuristic baselines, UMA with GPT-4o achieves on average an absolute 2-point F1/accuracy improvement for both short-context and long-context attribute abstraction. For pathologic T staging, UMA even outperforms the supervised model by 20 points in accuracy.


Title: Utilizing Machine Learning Models to Guide and Optimize Clinical Management of Cancer-Associated Compression Fractures
Authors: Sadegh Marzban; Alejandro Carrasquilla; Vishnu Venkitasubramony; Nam D Tran; Jeffrey West
Institution: Moffitt cancer research center
Presenter and email: Sadegh Marzban Sadegh.Marzban@moffitt.org
Abstract:

Introduction Spinal metastases (SM) occur in approximately 60√¢‚Ǩ‚Ä?0% of patients with systemic cancer and can cause significant pain and spinal instability. Minimally invasive procedures, such as percutaneous kyphoplasty, are increasingly used to manage SM, enabling faster initiation of chemoradiation. However, current clinical tools, like the Spinal Instability Neoplastic Score (SINS), are limited in their ability to integrate complex patient-specific factors that guide optimal treatment decisions. Methods We retrospectively analyzed a prospective database of 768 SM patients treated with kyphoplasty at a single Comprehensive Cancer Center between 2009 and 2020. Collected variables included patient demographics, tumor type, pain scores, Karnofsky Performance Status (KPS), and radiographic parameters. Kyphoplasty failure was defined as the presence of any of the following adverse outcomes: unchanged or worsened pain at latest follow-up, decline in KPS, requirement for multiple kyphoplasties, or subsequent major spinal surgery at the treated site. We developed a deep learning-based classification model to predict kyphoplasty failure using patient demographics, baseline health metrics (e.g., age, KPS, BMI, history of osteoporosis, number of vertebral levels treated), and cancer type as input features. A three-layer feedforward neural network with dropout and batch normalization was trained using 5-fold cross-validation. Feature importance was evaluated using permutation analysis, and the model was retrained using only significant predictors. Results Multiple myeloma (n=254) and breast cancer (n=103) were the most common primary tumor types in the cohort. The preliminary results showed that among the 23 prostate cancer patients, we observed a trend toward better kyphoplasty outcomes, with only 13% experiencing persistent pain and none requiring open surgery. From the deep learning model, the most important predictive features identified by the model were: number of vertebral levels treated, pre-procedural KPS, age, and history of osteoporosis. Among cancer types, multiple myeloma (MM) and lung cancer (LC) were the most predictive of kyphoplasty failure. The preliminary modeling achieved an average accuracy of 66% in predicting kyphoplasty failure. Conclusion Developing improved decision-making tools for patients with SM requires the ability to capture subtle, nonlinear relationships between diverse clinical and radiographic variables. This study demonstrates the utility of machine learning approaches in identifying high-risk patients and guiding treatment strategies. Future work will incorporate detailed vertebral involvement and spinal alignment parameters to further enhance model performance.


Title: Statistical Learning for Heterogeneous Treatment Effects: Pretraining, Prognosis, and Prediction
Authors: Maximilian Schuessler, Erik Sverdrup, Robert Tibshirani
Institution: Stanford University
Presenter and email: Maximilian Schuessler maxsc@stanford.edu
Abstract:

Robust estimation of heterogeneous treatment effects is a fundamental challenge for optimal decision-making in domains ranging from personalized medicine to educational policy. In recent years, predictive machine learning has emerged as a valuable toolbox for causal estimation, enabling more flexible and rigorous effect estimation. However, accurately estimating conditional average treatment effects (CATE) remains a major challenge, particularly in the presence of many covariates. In this article, we propose pretraining strategies that leverages a phenomenon in real-world applications: factors that are prognostic of the outcome are frequently also predictive of treatment effect heterogeneity. In oncology, for example, components of the same biological signaling pathways frequently influence both baseline risk and treatment response. Building on the statistical properties of established CATE estimators, including (Uni)lasso, boosting and random forests, we introduce a suite of enhanced models that exploit synergies between risk prediction and causal effect estimation. This cross-task learning enables more accurate signal detection, yielding lower estimation error and reduced false discovery rates. Our approach also demonstrates increased power to detect treatment effect heterogeneity in settings where the outcome and treatment effect functions exhibit joint support. These results present a promising strategy for advancing heterogeneous treatment effect estimation across diverse scientific, clinical, and social science settings.


Title: Identifying Tissue-Level Markers of Thin Melanoma Survival via Model-Based Clustering of Spatial Omics Data
Authors: Min Zhang, Dr. Vivi Arief, Associate Prof. Quan Nguyen, Prof. Geoffery McLachlan, Prof. Kaye Basford
Institution: The University of Queensland
Presenter and email: Min Zhang min.zhang@uq.edu.au
Abstract:

Melanoma is a malignancy of melanocytes, the pigment producing cells in the skin. Queensland (Australia), where this study was conducted, has one of the highest melanoma incidence rates globally. Thin melanomas are early-stage melanomas that have not deeply penetrated the skin. Although having generally favourable prognosis, thin melanomas account for the majority of new melanoma diagnoses, and some patients experience poor outcomes despite excision. Notably, melanomas initially diagnosed as thin contribute to one-quarter of melanoma-related deaths in countries like Australia and the United States. This highlights the need to identify survival-associated markers for thin melanoma to assist the identification of high-risk patients and help suggesting potential therapeutic targets. The tumour microenvironment (TME) is a complex and dynamic system comprising tumour cells, immune cells, and structural components, and can have tumour suppressive or promotive effects. Recent advances in spatial omics technologies have enabled more quantitative analyses of the TME. These include spatially resolved transcriptomics, which measure gene expression activity within intact tissue samples while preserving spatial context. A previous study (Schürch et al.,2020) applied k-means clustering to colorectal cancer (CRC) spatial omics data and identified high-order CRC tissue structures, known as cellular communities, defined as a spatial collection of various cell types at a specific local density. Subsequent analyses indicated that the community-specific cell type frequency can serve as high-order markers for CRC patient survival. However, the use of k-means clustering had several limitations in this context, including being restricted in identifying clusters with anisotropic shapes or varying variances, and it did not account for the compositional structure of the data. This inspired the current study to use an alternative approach to cluster spatial omics data, and identify tissue-level markers associated with thin melanoma survival. Specifically, we applied a model-based clustering approach using a Multinomial mixture model, which assumes the data consist of a finite mixture of components, each corresponding to a cluster (i.e., cellular community) with its own Multinomial distribution. This approach is theoretically appropriate for compositional spatial omics data and provides a flexible, probabilistic framework that is robust to noise and outliers. Our results indicate that the model-based clustering approach produced cellular communities with homogeneous compositions and effectively reflected individual cell identities while incorporating spatial information. We subsequently identified several tissue-level markers, defined as community-specific cell type frequencies, that were significantly associated with thin melanoma patient survival. Our study contributed to the understanding of thin melanoma patient variation by applying a statistical approach to spatial omics data, enabling the identification of higher-order tissue structures and survival-associated markers. Our findings also assist the identification of high-risk patients and informs potential therapeutic targets for thin melanoma, offering a basis for future biological studies and clinical investigations. Furthermore, our approach can be applied to other cancer types as well as spatial data types.


Title: Value of Information Analysis for External Validation of Survival Risk Prediction Models
Authors: Tae Yoon Lee, Summer S Han
Institution: Stanford University School of Medicine
Presenter and email: Tae Yoon Lee. harrytyl@stanford.edu
Abstract:

Purpose: A clinical risk prediction model needs to be validated before deployment in a new target population. As such, external validation is often carried out with a representative sample of the target population to assess the model performance and subsequently decide whether to use the model in this population. However, there is inherent uncertainty surrounding the model performance due to the finite size of the validation sample, potentially leading to an incorrect decision. Value-of-information (VoI) analysis is a decision-theoretic approach to quantify the potential value of reducing this uncertainty, informing investigators whether further sample procurement is worthwhile during model validation. Unlike classical inferential measures (e.g., confidence intervals around a c-statistic), VoI analysis evaluates the direct impact on clinical utility in terms of net benefit (NB), incorporating the consequences (i.e., true positives and false positives) of the decisions made by the model. Building on recent developments for binary risk prediction models, we extended the VoI methodology to the validation phase of risk prediction models for time-to-event outcomes. Methods: Expected value of perfect information (EVPI) refers to the expected gain in NB by completely removing uncertainty during the validation phase. We propose a bootstrap-based algorithm for NB calculation with survival data. To demonstrate its utility, we conducted a case study with an existing risk prediction model for lung cancer incidence, LCRAT. LCRAT, which is recommended for use by the American College of Chest Physicians, has been developed using data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial, a randomized clinical trial in the US. LCRAT has been previously validated for 6-year lung cancer incidence against the Multiethnic Cohort Study (MEC), a representative cohort of five racial and ethnic groups from California and Hawaii (n=105,261). Treating the MEC as a target population, we calculated the validation EVPI across various sample sizes (100 to 1000) for a plausible range of risk thresholds from 1% to 2% (1.241% has been used as the threshold in the previous study). Results We found there was no value in collecting more validation samples for the MEC cohort. As expected, the EVPI decreased as the validation sample size increased (Figure), ranging from 0.0026 (n=100) to 0.0000 (n=1,000) at the threshold of 1.24%. This implies that for n=100, one could achieve a maximum net benefit of 0.0026 with further procurement of validation samples. In other words, completely removing uncertainty would result in an expected increase of 260 true positives (correctly identifying lung cancer cases with screening) or equivalently an expected reduction of 20,691 false positives (unnecessary screening) for every 100,000 decisions guided by the model. Conclusions: Many risk prediction models in medicine are developed using time-to-event data. Extending the VoI methodology to survival data enables investigators to assess the impact of uncertainty on clinical utility along classical inferential measures and informs whether collecting more samples is worthwhile during model validation.


Title: Socioeconomic Inequalities and Lung Cancer Outcomes: Evidence from an Integrated EHR Database and State Cancer Registry Data
Authors: Tae Yoon Lee, Chloe C. Su, Eunji Choi, Victoria Y. Ding, Mina Satoyoshi, Archana Bhat, Tony Chen, Ingrid Luo, Yuhan Liu, Annabel X. Tan, Solomon Henry, Leah M. Backhus, Timothy J. Ellis-Caleo, Scarlett Lin Gomez, Natalie S. Lui, Ann Leung, Curtis Langlotz, Joel W. Neal, Allison W. Kurian, Heather A. Wakelee, Su-Ying Liang, Summer S. Han
Institution: Stanford University School of Medicine
Presenter and email: Tae Yoon Lee. harrytyl@stanford.edu
Abstract:

Abstract: Introduction: With advances in therapy and early detection of lung cancer (LC), the number of LC survivors is rapidly increasing, underscoring the need to understand factors influencing their long-term outcomes. Neighborhood-level social drivers of health (nSDOH) are crucial in cancer research, as they relate to access to healthcare and quality of care. However, previous studies on developing second primary LC (SPLC) and LC-mortality among LC survivors have primarily focused on individual-level SDOH, often overlooking neighborhood influences. Further, many studies using electronic health records (EHRs) rely on data from single healthcare systems, which may have limited diversity of patient populations and uniformity clinical practice patterns. This study aimed to evaluate the associations between multiple nSDOH and SPLC and LC-mortality among LC survivors, leveraging an integrated database to address these gaps. Methods: We utilized an integrated EHR database called Oncoshare-Lung, linking two independent healthcare systems—Stanford Health Care (academic) and Sutter Health (community)—and the California Cancer Registry. We used patients' mailing addresses at IPLC diagnosis to calculate well-validated nSDOH indices at the census-tract level, including indices of concentration at the extremes (ICEs) that measure polarization between privileged and deprived populations on a continuous scale from -1 (most deprived) to +1 (most privileged). Main exposures included the ICE indices for income, education, and race/ethnicity (Black/White). For SPLC and LC-mortality analyses, we used cause-specific Cox regression and standard Cox regression with SPLC diagnosis as a time-varying variable, respectively, adjusting for demographic (e.g., smoking) and clinical factors (e.g., stage, histology). Further, we quantified the relative contributions of the direct effect of nSDOH on LC-mortality and its indirect effect mediated by increased SPLC diagnoses due to nSDOH. Results: Among 35,499 patients diagnosed with IPLC between 2009 and 2022 and followed up to 10 years for SPLC and mortality, there were 825 SPLC cases and 24,536 deaths (70% LC deaths) over 96,420 person-years (average follow-up: 2.7 years). The cohort included 52.8% (N=18,744) adenocarcinoma IPLC, 52.4% (N=18,584) female, 14.7% (N=5,209) Asian American/Pacific Islander, 7.6% (N=2,701) Hispanic, and 7.2% (N=2,538) non-Hispanic Black/African American. Our analysis revealed that lower ICE-Education was significantly associated with an increase in SPLC risk (adjusted hazard ratio [aHR] per 1 unit decrease:1.28,95%CI:1.22-1.35). Its association with LC-mortality was more pronounced (aHR:1.49,CI:1.43-1.56). Consistent with previous studies [PMID:34893871], we observed the detrimental effect of SPLC diagnosis on LC-mortality (aHR:1.84,CI:1.74-1.93). The mediation analysis indicated that the increase in SPLC diagnoses associated with ICE-Education contributed to 28% of the overall effect of ICE-Education on LC-mortality (aHR:1.75,CI:1.49-2.08). Conclusion: This study underscores the importance of understanding socioeconomic disparities within communities and their impact on LC survivors’ health outcomes. Survivors in segregated neighborhoods with greater concentration of low educational attainment face increased risks of SPLC and LC-mortality, independent of sex, race/ethnicity and smoking history. Our mediation analysis highlighted that the LC-mortality was significantly influenced by both the direct and indirect effects of educational segregation mediated through SPLC diagnoses. Thus, addressing these nSDOH disparities is crucial to improve healthcare access, enhance treatment, and implement targeted early detection of SPLC in this vulnerable population.


Title: TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models
Authors: J Gonzalez, C Wong, Z Gero, J Bagga, R Ueno, I Chien, E Oravkin, E Kiciman, A Nori, R Weerasinghe, R Leidner, B Piening, T Naumann, C Bifulco, H Poon
Institution: Microsoft Research
Presenter and email: Juan Manuel Zambrano Chaves juanza@microsoft.com
Abstract:

The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery. In practice, however, such data is most abundantly available in unstructured forms, such as clinical notes in electronic medical records (EMRs), and it is generally plagued by confounders. In this paper, we present TRIALSCOPE, a unifying framework for distilling real-world evidence from population-level observational data. TRIALSCOPE leverages biomedical language models to structure clinical text at scale, employs advanced probabilistic modeling for denoising and imputation, and incorporates state-of-the-art causal inference techniques to combat common confounders. Using clinical trial specification as generic representation, TRIALSCOPE provides a turn-key solution to generate and reason with clinical hypotheses using observational data. In extensive experiments and analyses on a large-scale real-world dataset with over one million cancer patients from a large US healthcare network, we show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials. In addition to facilitating in-silicon clinical trial design and optimization, TRIALSCOPE may be used to empower synthetic controls, pragmatic trials, post-market surveillance, as well as support fine-grained patient-like-me reasoning in precision diagnosis and treatment.


Title: BIG-SSD: Baseline Design-Initiated and Prior-Guided Sample Size Determination with Historical Data
Authors: Min Lin, Eric Baron, Jian Zhu, Estelle Lambert, Ming-Hui Chen
Institution: University of Connecticut
Presenter and email: Min Lin min.2.lin@uconn.edu
Abstract:

We propose a general Bayesian sample size determination (SSD) framework, named BIG-SSD (Baseline design-Initiated and prior-Guided Sample Size Determination), for designing superiority clinical trials that systematically incorporate historical data. The BIG-SSD methodology involves three steps: (i) establishing a baseline design with a noninformative prior to meet prespecified design operating characteristics; (ii) eliciting an informative prior based on high-quality historical data; and (iii) formally incorporating this elicited prior through active borrowing to determine the required sample size calibrated from the baseline design. For scenarios lacking closed forms, a computationally efficient progressive Monte Carlo binary search algorithm is proposed for SSD. We demonstrate the application of BIG-SSD using historical control data from the Parkinson’s Progression Markers Initiative for designing a prospective clinical trial in early-stage Parkinson’s disease. The BIG-SSD framework provides a structured approach to effectively quantify the impact of historical data borrowing on trial design.


Title: A Joint Modeling Approach for Estimating Expected Survivals based on Longitudinal Medical History
Authors: Yanyan Zhu, Chenghao Chu, Bingming Yi, and Ming-hui Chen
Institution: University of Connecticut
Presenter and email: YANYAN ZHU yanyan.zhu@uconn.edu
Abstract:

Physicians and public health decision makers are often interested to predict the timing of certain clinical event (survival data) based on the patient’s medical history (longitudinal data). Solutions to such problems not only help early intervention/prevention for public health but can also be crucial to the population selection for designing clinical trials, especially for chronic diseases with long latency. However, the solution can be quite challenging. Longitudinal-survival joint modeling methods have been one of the popular approaches. In this paper, we propose a new approach to estimate the expected survival probabilities that a patient will experience a clinical event within a specified time frame under the joint model of longitudinal and survival data. For the first time in the literature to our best knowledge, our approach utilizes both the fully observed longitudinal and survival fitted data as well as the partially observed longitudinal predictive data to obtain the estimates of the parameters of the joint model. An asymptotical confidence interval is derived for the expected survival and an efficient simulation-based algorithm is developed to obtain an interval for the predictive survival. An extensive simulation is carried out to examine the empirical performance of the proposed methodology.


Title: On Improving Evaluation and Reasoning of RAG Model Responses for Cancer Survivability Analysis on SEER
Authors: Jyothi Vaidyanathan, Shourya Gupta, Justin Lee, Anaya Dandekar, Srikanth Prabhu, Saptarshi Sengupta
Institution: San Jose State University
Presenter and email: Jyothi Vaidyanathan jyothi.vaidyanathan@sjsu.edu
Abstract:

Survival analysis is a critical task in cancers affecting the Brain, Central Nervous System (BCNS), and Bone. While Large Language Models (LLMs) have shown promise in cancer prognosis, challenges such as hallucinations and limited interpretability remain. To address these, we have built a retrieval-augmented framework that leverages clinical, pathological, and demographic data from the NIH SEER database. Our system supports n-shot querying with or without context and offers an interactive, user-friendly interface for exploring survival outcomes. We extend upon our previous work where we evaluated several tree-based classifiers, using DistilBERT to embed structured patient data. Among these, an ensemble XGBClassifier trained on SEER-derived embeddings achieved the best performance for the binary 5-year survivability prediction (85.05% accuracy). To enhance flexibility and reasoning, we subsequently introduced TabLLM, HyDE-RAG, and Step-Back RAG for BCNS cancers with promising improvements in reasoning about responses. For Bone cancer, we now extend the reasoning engine using GraphRAG with evaluation, which incorporates a knowledge graph to improve interpretability and generate context-aware responses. These models accommodate varied context lengths and provide transparent, human-verifiable reasoning for their outputs. In this work, we use LLMs as evaluators, assessing model-generated responses with semantic similarity metrics grounded in vector databases and knowledge graphs. Several evaluation configurations are proposed and include (a) GPT (with Pinecone) evaluated by Gemini, (b) GPT evaluated by GraphRAG (GPT-backed), (c) GraphRAG evaluated by GPT (with Pinecone) and (d) GraphRAG evaluated by Gemini (with Pinecone). All setups yield favorable performance, highlighting the potential of LLM-guided evaluation. Additionally, we also evaluate how the different RAG approaches perform. Early experimentation shows that GraphRAG backed by GPT is able to evaluate responses by other LLMs better, with detailed reasoning and accurately reflects the specifics of the data. For example, in response to the query √¢‚Ǩ≈ìWhat is the most common location for malignant bone sarcomas in the dataset? Evaluate the answer provided by GPT supported Pinecone given as √¢‚Ǩ¬¶√¢‚Ǩ¬¶√¢‚Ǩ¬? we note that the GPT-backed GraphRAG model not only evaluates the response but also: (a) accurately highlights common sarcoma sites and (b) demonstrates contextual awareness by recognizing their broader anatomical distribution. We demonstrate that survival analysis can be effectively reframed as a language modeling task, enabling dynamic and interpretable interaction with clinical data. We have undertaken a comprehensive approach to evaluation using LLM-as-a-Judge to assess alignment, robustness, and accuracy across the stated models. A logical next step is to improve the response generation by fine tuning on additional health indicators beyond SEER using LoRA and enabling integration with external knowledge sources to retrieve relevant insights that are beyond the scope of SEER.


Title: Meta-Analytic-Prior-Embed BART for Robust Integration of External Data in Clinical Trials
Authors: Yunxuan Zhang
Institution: University of Chicago
Presenter and email: Yunxuan Zhang yunxuanz@uchicago.edu
Abstract:

Incorporating external data into clinical trial analyses offers the potential to reduce sample size requirements, enhance statistical power, and improve the precision of treatment effect estimationóparticularly in hybrid or synthetic control designs where concurrent controls are limited or unavailable. However, such integration poses methodological challenges due to cross-source heterogeneity and the persistent threat of unmeasured confounding, both of which can bias effect estimates. To address these challenges, we build upon the Bayesian Additive Regression Trees (BART) framework, which is well regarded for its strong predictive performance and ability to flexibly capture complex, nonlinear relationships. We extend BART by embedding a hierarchical structure within each terminal node: after subjects are partitioned into terminal nodes based on observed covariates, we allow data from different sources to be associated with distinct parameters. This formulation enables the model to accommodate source-specific variation not explained by observed covariates, thereby mitigating bias from unmeasured confounding and enabling more valid and robust inference in integrated trial settings. We refer to this proposed extension as Meta-analytic Predictive Prior Embedded Bayesian Additive Regression Trees (MAP-BART). In parallel, motivated by regulatory concernsósuch as the FDAís guidance that the effective sample size (ESS) of prior information should not exceed that of the concurrent controlówe adopt the curvature-based ESS definition of Morita et al. and extend it to quantify the influence of prior information across the tree ensemble. Leveraging this measure, we introduce a principled calibration strategy for BART that adaptively controls prior contributions in accordance with regulatory limits. We evaluate the proposed method through simulations and real-world applications, comparing its performance to alternative approaches including Propensity Score Composite Likelihood (PSCL) and Meta-Analytic Predictive (MAP) priors. Results demonstrate improved operating characteristics, increased robustness to unmeasured confounding, and enhanced transparency in prior-data integration.