Invited Session 1: Big Data and its Use in Cancer Research

Chair: Ying Lu, Professor of Biomedical Data Science (Biostatistics), Radiology, and Health Research and Policy, Stanford University

Ying Lu Ying Lu, Ph.D., is Professor in the Department of Biomedical Data Science, and by courtesy in the Department of Radiology and Departement of Health Research and Policy, Stanford University. He is the Co-Director of the Stanford Center for Innovative Study Design and the Biostatistics Core of the Stanford Cancer Institute. Before his current position, he was the director of VA Cooperative Studies Program Palo Alto Coordinating Center (2009-2016) and a Professor of Biostatistics and Radiology at the University of California, San Francisco (1994-2009). His research areas are biostatistics methodology and applications in clinical trials, statistical evaluation of medical diagnostic tests, and medical decision making. He serves as the biostatistical associate Editor for JCO Precision Oncology and co-editor of the Cancer Research Section of the New England Journal of Statistics and Data Science. Dr. Lu is an elected fellow of the American Association for the Advancement of Science and the American Statistical Association. Dr. Lu initiated the Stat4Onc Annual Symposium with Dr. Ji and Dr. Kummar in 2017 and is the PI of the R13 NCI grant for this conference.


Big Data in Oncology

Speaker: George Sledge, CMO for Caris - Genomic, Big data

George Sledge George W. Sledge, Jr., M.D. oversees Caris Life Sciences' medical affairs, research, and medical education, including oversight and leadership for the Caris Precision Oncology AllianceTM and Caris' global team of Medical Science Liaisons. Prior to joining Caris, Dr. Sledge was Professor of Medicine at the Stanford University School of Medicine where he served as a member of the Division of Oncology. He was most recently co-director of the Stanford Cancer Institute's Cancer Therapeutics Program and served from 2013-2020 as Chief of the Division of Oncology. Trained in Internal Medicine and Medical Oncology, Dr. Sledge has devoted his professional career to understanding the biology and improving the treatment of breast cancer. He is active as both a laboratory and clinical researcher, with more than 390 scientific publications.

Abstract

TBA


Validation of Predictive Analyses for Interim Decisions in Clinical Trials

Speaker: Lorenzo Trippa, Associate Professor of Biostatistics, Harvard School of Public Health

Lorenzo Trippa Lorenzo Trippa is Associate professor at the Dana Farber Cancer Institute, Biostatistics and Computational Biology Department, and at the Harvard School of Public Health, Biostatistics Department. He is trained in statistics and his research interests include the design of efficient clinical trials. He is interested in clinical trials for studying personalized medicine. He developed methods for the analysis of data generated for complex Bayesian adaptive clinical trials. He is currently collaborating with physicians at DFCI on applying Bayesian adaptation to trials of glioblastoma, a form of brain cancer, and incorporating genomic information in designing Bayesian clinical trials. He is also collaborating with physicians Harvard to design innovative trials for melanoma targeted therapies.

Abstract

Adaptive clinical trials use algorithms to predict, during the study, patient outcomes and final study results. These predictions trigger interim decisions, such as early discontinuation of the trial, and can change the course of the study. Poor selection of the Prediction Analyses and Interim Decisions plan (PAID) in an adaptive clinical trial can have negative consequences, including the risk of exposing patients to ineffective or toxic treatments.We present an approach that leverages datasets from completed trials to evaluate and compare candidate PAIDs using interpretable validation metrics. The goal is to determine whether and how to incorporate predictions into major interim decisions in a clinical trial. Candidate PAIDs can differ in several aspects, such as the prediction models used, timing of interim analyses, and potential use of external datasets. To illustrate our approach, we considered a randomized clinical trial in glioblastoma. The study design includes interim futility analyses, based on the predictive probability that the final analysis, at the completion of the study, will provide significant evidence of treatment effects. We examined various PAIDs with different levels of complexity to investigate if the use of biomarkers, external data, or novel algorithms improved interim decisions in the glioblastoma clinical trial. Validation analyses based on completed trials and electronic health records support the selection of algorithms, predictive models, and other aspects of PAIDs for use in adaptive clinical trials. In contrast, PAID evaluations based on arbitrarily defined ad hoc simulation scenarios, which are not tailored to previous clinical data and experience, tend to overvalue complex prediction procedures and produce poor estimates of trial operating characteristics such as power and the number of enrolled patients.


Advancing Precision Oncology with Large-Scale Real-World Clinico-Genomics Data

Speaker: Ruishan Liu, Postdoctoral Scholar, Stanford University

Ruishan Liu Ruishan Liu is a postdoctoral researcher in Biomedical Data Science at Stanford University, working with Prof. James Zou. She received her PhD in Electrical Engineering at Stanford University in 2022. Her research lies in the intersection of machine learning and applications in human diseases, health and genomics. She was the recipient of Stanford Graduate Fellowship, and was selected as the Rising Star in Data Science by University of Chicago, the Next Generation in Biomedicine by Broad Institute, and the Rising Star in Engineering in Health by Johns Hopkins University and Columbia University. She led the project Trial Pathfinder, which was selected as Top Ten Clinical Research Achievement in 2022 and Finalist for Global Pharma Award in 2021.

Abstract

We will discuss analyses and modeling of paired genomics and EHR data of over 40K patients with cancer. We leverage this data to discover hundreds of genomic biomarkers that predict patient's response to specific cancer treatments. These predictive biomarkers generate interesting biological hypotheses and can inform personalized treatment recommendations.


Use of Large Datasets to Facilitate Statistical Modeling for Addressing Complex or Difficult Breast Cancer Research Questions

Speaker: Yisheng Li, Professor, Department of Biostatistics, Division of Basic Science Research, The University of Texas MD Anderson Cancer Center, Houston, TX MD Anderson

Yisheng Li Dr. Yisheng Li received his PhD in biostatistics from The University of Michigan, Ann Arbor, in 2003. He is currently a professor in the Department of Biostatistics at The University of Texas MD Anderson Cancer Center. His main research interests include Bayesian adaptive clinical trial design, Bayesian nonparametrics and applications, and objective Bayesian methods. He has collaborated with behavioral scientists, health disparities researchers, and clinicians specialized in various cancer sites at MD Anderson.

Abstract

In this talk I will focus on discussions of two breast cancer research projects in which I have been involved: 1) Using the MD Anderson model (Model M) to evaluate the impacts of precision screening and precision treatment strategies on US breast cancer mortality trends. We also aim to quantify the contributions of the screening and treatment strategies to the breast cancer mortality reduction in the US. This is part of the Cancer Intervention and Surveillance Modeling Network (CISNET) Breast Working Group (BWG) projects. Due to the large number of factors that may influence breast cancer mortality, the highest quality data representative of the US are used to derive input parameter values needed for our microsimulation model. The data sources include the Surveillance, Epidemiology, and End Results (SEER) program, Breast Cancer Surveillance Consortium (BCSC), National Comprehensive Cancer Network (NCCN), national survey data such as National Health Interview Survey (NIHS) and National Health and Nutrition Examination Survey (NHANES), and results from large meta-analyses, among others. We use an approximate Bayesian computation (ABC) method to derive posterior distributions of unknown model parameters. Combining these posterior distributions of unknown model parameters with derived values of input parameters from large datasets, Model M simulates millions of women, follows them in their lifetime, and uses the large simulated dataset to summarize and evaluate breast cancer incidence and mortality rates, as well as quantify contributions of the screening and treatment strategies to breast cancer mortality reduction. 2) Using data for tens of thousands of breast cancer patients seen at MD Anderson in a defined time period, we train a prognostic model for overall survival (OS) for early stage (I-III) and metastatic patients separately, and validate the models externally in tens of thousands or thousands of patients identified from the NCCN for the respective patient population. Use of these large datasets has facilitated robust evaluation of the performance of our developed prognostic models, and supports the use of these models in clinical decision making and stratification in clinical trials.