We have reached the take off point in the generation of massive datasets from individuals and across populations, both of which are necessary for personalized precision medicine. I will give an example of my N=1 self-study, in which I have my human genome as well as multi-year time series of my gut microbiome genomics and over one hundred blood biomarkers. This is now being augmented with time series of my metabolome and immunome. These are then compared with hundreds of healthy people's gut microbiomes, revealing major shifts between health and disease. Multiple companies and organizations will soon be carrying out similar levels of analysis on hundreds of thousands of individuals. Machine learning techniques will be essential to bring the patterns out of these exponentially growing datasets
Quality of care, as would be reflected by the universal provision of standardized, evidence based and truly indicated care, has not improved to the degree one would have hoped. Similarly, while patient safety and medical errors have come into public awareness, advances in these areas have been slow, hard won, and unsupported by the kinds of smart, data driven engineering designs that have gone into other domains. The interest in applying machine learning to clinical practice is increasing yet the practical application of these techniques has been less than desirable. Clinicians continue to make determinations in a technically unsupported and unmonitored manner due to a lack of high-quality evidence or tools to support most day-to-day decisions. There is a persistent gap between the clinicians required to understand the context of the data and the engineers who are critical to extracting useable information from the increasing amount of healthcare data that is being generated. This talk focuses on the divide between the data science and healthcare silos, and posits that the lack of integration is the primary barrier to a data revolution in healthcare. I first discuss literature that supports the existence of this divide, and then I present recommendations on how to bridge the gap between practicing clinicians and data scientists.
To take full advantage of clinically relevant information implicitly captured in medical images, we develop robust algorithms for quantifying disease burden from patient scans. We then demonstrate how genetic and clinical variables can be used to predict anatomy and anatomical change through a semi-parametric generative model. Joint modeling of image and genetic data promises to provide insights into genetic factors and anatomical effects of the disease. We demonstrate the promise of this approach on large collections of brain scans of different patient cohorts
Early Warning Scores and other forms of predictive modeling present clinicians with real time estimates of the risks of imminent untoward events based on statistical models trained on legacy data sets. Nearly all such tools are based on static and intermittent data elements such as demographics, diagnoses, notes, vital sign measurements and lab test results. Continuous physiological monitoring such as EKG telemetry is another potential source, and has the potential advantage of higher data coverage. It introduces a new step in the modeling process, though, that of time series analysis of cardiorespiratory dynamics to detect signatures of illness. The University of Virginia group has investigated comprehensive approaches to predictive modeling that use static, intermittent and continuous data streams for early detection of subacute potentially catastrophic illness in infants and adults, in ICUs and on hospital floors.
Reinforcement learning offers a powerful paradigm for automatically discovering and optimizing sequential treatments for chronic and life-threatening diseases. This talk will introduce basics of reinforcement learning and then discuss several aspects of this work, including: How should we collect data to learn good sequential treatment strategies? How can we learn a representation of the data that allows generalization across patients? How can we use the data collected to discover sequential treatment strategies that are tailored to patient characteristics and time-dependent outcomes? The methods presented will be illustrated using results of our work on learning adaptive neurostimulation policies for the treatment of epilepsy.
It is often difficult to accurately predict who, when, and why patients would develop shock because signs of shock often occur late when organ injury is already present. Three levels of aggregation of information can be used to aid the bedside clinician in this task: analysis of derived parameters of existing measured physiologic variables using simple bedside calculations (Functional Hemodynamic Monitoring), using prior physiologic data of similar subjects during periods of stability and disease to define quantitative metrics of level of severity; and to use libraries of responses across large and comprehensive collections of records of diverse subjects whose diagnosis, therapies and course of treatment is already known to predict not only disease severity, but also the subsequent behavior of the subject if left untreated or treated with one of the many therapeutic options. A major pre-analysis problem is the cleaning of data to remove non-physiologic artifacts due to technical errors, which correspond to >70% of all clinical alerts. We have been developing algorithms that effectively isolate ~85% of all artifacts among alerts generated from physiologic time series of vital sign data. The next problem is to define the minimal monitoring data set needed to initially identify patients at risk across all possible processes and then specifically monitor their response to targeted therapies known to improve outcomes. To address these issues, we represented the vital sign data with highly multivariate feature sets and used machine learning algorithms to infer parsimonious predictive models for cardiorespiratory insufficiency. We describe the nature of the required data sets and modeling approaches used to detect, forecast, and track evolution of risk for this severe condition. These approaches jointly enable earlier identification of cardiorespiratory insufficiency and direct focused patient-specific management. To validate our methodology, we used both a porcine model of hemorrhage and human vital sign data collected in a trauma step-down unit. Our results show value of truly multivariate fused approach versus more traditional single vital sign thresholding at detection, and how it can also allow for reliable forecasting of cardiorespiratory insufficiency before its overt signs become apparent. Also, increasing resolution of signal processing from mean data collected at regular intervals to beat-to-beat and waveform analysis progressively improves the predictive value of the fused parameters. In addition, we show that using personalized reference data can further improve detectability and predictability of cardio-respiratory insufficiency, if such data is available. Finally, we demonstrate that temporal evolution of risk for cardiorespiratory insufficiency is a heterogeneous yet a systematic process. Most patients who develop this condition follow one of only a handful typical risk evolution trajectories, and they can be assigned to their most likely trajectory type well ahead of the onset, therefore enabling further gains in predictability.
Why is it so hard to drive change in healthcare? The data, technology, and insights exist but despite this it is so hard to move the needle in the right direction. What's the point of developing a technology if it is never going to be used due to business, cultural, or human behavior challenges. Understanding these issues can help you have greater impact. Learn how to ask the right questions that will yield the greatest impact.
Electronic Health Record (EHR) phenotyping utilizes patient data captured through normal medical practice, to identify features that may represent computational medical phenotypes. These features may be used to identify at-risk patients and improve prediction of patient morbidity and mortality. We present a novel deep multi-modality architecture for EHR analysis (applicable to joint analysis of multiple forms of EHR data), based on Poisson Factor Analysis (PFA) modules. Each modality, composed of observed counts, is represented as a Poisson distribution, parameterized in terms of hidden binary units. Information from different modalities is shared via a deep hierarchy of common hidden units. To explore the utility of these models, we apply them to a subset of patients from the Duke-Durham patient cohort. We identified a cohort of over 12,000 patients with Type 2 Diabetes Mellitus (T2DM) based on diagnosis codes and laboratory tests out of our patient population of over 240,000. Examining the common hidden units uniting the PFA modules, we identify patient features that represent medical concepts. Experiments indicate that our learned features are better able to predict mortality and morbidity than clinical features identified previously in a large-scale clinical trial.
The Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU) is a team of doctors, machine learners, and engineers committed to developing real-time clinical decision support for the pediatric ICU. We will discuss our perspective on what needs exist in the ICU and how machine learning can meet these needs. We will highlight some of our recent machine learning work that aims to enable solutions to those needs.
If you have any questions regarding the symposium, please send us an email.