2014 Program

Day 1



Ben Marlin and Randall Wetzel


Predicting Clinically Relevant Events Using Real Time Data Streams and Data Libraries

Michael R. Pinsky, MD, CM, D, MCCM, FCCP. University of Pittsburgh

If one could accurately predict who, when, and why patients develop shock then effective preemptive treatments could be administered to improve outcome and more effectively use healthcare resources. But signs of shock often occur late and once organ injury is already present. We have previously shown that an integrated monitoring system (IMS) when coupled with a bedside care plan improves outcome in step down unit (SDU) patients and reduces the duration and intensity of cardiorespiratory insufficiency. We have also shown that HR variability analysis (sample entropy of RR interval variance) could identify at-risk patients within 2 minutes and if monitored for 5 minutes, and enable a precise prediction of those SDU patients who would remain stable or deteriorate within the next 48 hours. The problem is in defining the minimal monitoring data set needed to initially identify those patients across all possible processes and then to specifically monitor their response to targeted therapies known to likely improve outcome. To address this challenge we developed multivariate models using machine learning (ML) methodology (multivariate regression analysis, Fourier analysis, principal component analysis, artificial neural networks, random forest classification and regression, etc.) to predict cardiorespiratory insufficiency in parsimonious fashion. Preliminary analysis of our clinically relevant porcine model of hemorrhagic shock with ML modeling allowed us to characterize the patterns seen in hemorrhage, hypovolemia to decompensation and resuscitation, and predict which animals would collapse during severe hypovolemia and which would not. We found that during hemorrhage, deviation from typical range of variability could be identified with a high degree of certainty minutes before any measurable change in the individual hemodynamic variables was observed. What is not yet known is the minimal data set (the number and type of independent measurements, sampling frequency, and the observation lead time), which are necessary to create a robust algorithm that will: 1) predict impending shock, 2) select the most effective treatments, 3) monitor the response to those treatments, and 4) determine when resuscitation has effectively restored tissue perfusion and can be stopped. We refer to this concept as Monitoring Parsimony. Also, it is not understood yet if additional monitoring parameters or longer observation times before leading to recognition of insufficiency, will improve the ability to achieve the above four targets with the best sensitivity and specificity once an alert is raised. We envision a basic monitoring surveillance that identifies the patients most likely to become unstable for targeted focused monitoring and treatments to deliver highly personalized medical care.

Metadata and Health Care

Jonathan Wanderer, MD, MPhil. Vanderbilt University.

Electronic health records (EHRs) generate tremendous quantities of metadata every day. Metadata, which is data that describes data, is a rich source of information that can offer detailed insights into how clinicians interact with data systems as well as with patients when extracted from EHRs. While uses of metadata in health care have been typically confined to addressing privacy concerns, I will discuss how metadata generated by EHRs can be analyzed to offer understandings of health care teams and systems that would otherwise be difficult or impossible to obtain.


Dynamical Biomedical Informatics

David Albers, PhD. Columbia University.

It is well understood that physiology and health are dynamic for an individual and vary over a population. Beginning with this assumption, I will introduce and discuss some of the roles time can play in the analysis of electronic health record data, including different parameterizations of time, biases that can be introduced when measurements are correlated with states, and more positively, how temporal biases can be exploited to aid in phenotyping.

Real Time Data Analytics and Electronic Medical Record Integration in Pediatric Critical Care: Proof of Concept

Mark Wainwright, MD, PhD. Northwestern University Feinberg School of Medicine.

We sought to create an informatics system in our pediatric ICUs which combines high-resolution physiologic data analyzed in real-time with data extracted from the EMR (Epic). We used IBM InfoSphere Streams to process the physiologic data to measure heart-rate variability and cardiopulmonary interactions, and a dedicated UI customized for Streams-derived data. Separately, we combined these physiologic measures with other measures of disease severity or organ injury derived from the EMR and used a commercially available biointegration suite to extract and analyze the combined EMR- and physiologic or Streams-derived data. Integration of data from multiple medical devices in the pediatric ICU and the analysis of these data in real time is feasible. Work is in progress to determine whether the availability of these data enables earlier detection and improved management of cardiac arrest and septic shock in the pediatric and cardiac ICUs.


The Phenotype Discovery Problem.

Thomas A. Lasko, MD, PhD. Vanderbilt University.

It's becoming clear that one of the key advances that we're going to need in order to achieve precision, personalized medicine is to stop using our clinically-driven descriptions (or phenotypes) of each disease, and start allowing the data to speak for themselves, to tell us what the phenotypes really are, including any characteristic patterns of how they progress over time. In this talk I will discuss my approach for data-driven phenotype discovery from Electronic Medical Records, some of the challenges it involves, and some promising early results.

Discovering Subtypes of Disease Trajectories and Learning from the Crowd.

Jennifer G. Dy, PhD. Northeastern University.

This talk presents two different ways of learning from complex medical data when ground truth is not available. In the first part of my talk, I will introduce a novel way to perform clustering. In particular, I will investigate subtyping chronic obstructive pulmonary disease (COPD), a lung disease characterized by airflow limitation resulting from chronic inflammatory responses in the airways and lungs to noxious particles or gases (e.g., cigarette smoke). We introduce a different way of looking at the COPD subtyping task by taking into account how lung health changes as a function of risk factors (such as age and smoke exposure) for identifying meaningful subgroups. We call this function a “disease trajectory.” Some people are more resistant to the effects of smoke exposure, while others are highly sensitive to smoke and experience rapid health decline given similar levels of exposure. We model disease trajectories using Gaussian Processes and enable learning multiple trajectories and the association of individuals to these trajectories with a Dirichlet process mixture of Gaussian Processes. In addition, we show how to incorporate expert guidance into our model. In the second part of my talk, I will discuss the use of crowd sourcing to deal with problems where it is difficult and in some cases impossible to collect perfect annotations (ground-truth). For example, radiologists regularly disagree on the findings/diagnosis for the same radiological image. I will introduce a probabilistic model for learning a classifier from multiple annotators, where the reliability of the annotators may vary with the annotator and the data that they observe. Our approach allows us to provide estimates for the true labels given new instances and also provide the expertise variability for each annotator given different data samples.

Clustering ICU Data using Measurement Timings and Values

Juan Casse, UC Riverside.

We propose an unsupervised learning method for automatically discovering patient clusters, which have discernible physiologic patterns and prognostic significance, from pediatric intensive care unit (PICU) physiologic data. The data are complex: time-varying, high-dimensional, sparse, uncertain and incomplete. The data are not “missing at random” as is commonly assumed. To alleviate these problems, we learn a conjoint piecewise- constant conditional intensity model (C-PCIM) of the temporal dependencies of the data. Once we have a model for the entire dataset (an “average patient”), we extract a fixed-length feature vector for each patient by computing the implied features of the Fisher score. We apply a co-clustering method for real-valued data, with a minimum description length (MDL)-based criterion function, that automatically determines the number of clusters. We tested our method on a dataset from the PICU at Children’s Hospital Los Angeles, consisting of approximately 10,000 patient episodes collected over ten years. The results show clusters with discernible physiologic patterns and prognostic significance.


Homes to Hospitals: How Data Sharing and Technology are Reshaping Medical Care

Katherine Homann, MD. University of California San Diego.
Mallikarjuna Rettiganti, PhD

The Emergency Department (ED) is on the front lines for many aspects of the health care system, including health care innovation. Using the UC San Diego Emergency Department as an example, its range of projects reveal the successes, challenges and potential of taking health care to the next level with technology. The Beacon Health Information Exchange (HIE) bridges paramedic and ED data- allowing EKG transmission for heart attack patients to shorten time to potentially lifesaving treatments. The same HIE uniting public health departments with EDs create a range of health care prevention projects. Technology now offers the chance to bring Hospital Care to the Home—sensors worn by asthma patients monitor air quality and warn of worsening conditions that would otherwise send them to the hospital, and even create a map of risk factors to help protect public health. Furthermore, steps towards incorporating telemedicine within ED care, and using that technology to provide “hospital care at home” have the potential to redefine the future of medical care.

The UC Research Exchange (UC-ReX) and the Los Angeles Data Resource (LADR): Initial Experience with Federated Systems for Research use of Clinical Data.

Douglas Bell, MD, PhD. University of California Los Angeles.

We have co-created two federated data systems for "Cohort Discovery," enabling researchers to estimate potential cohort sizes across multiple intstitutions for different study inclusion and exclusion criteria. The University of California Research eXchange (UC-ReX) searches for patients across all five University of California medical centers (UCLA, UCSF, UC Davis, UC Irvine, and UCSD), using search criteria that can include the patients' diagnoses, surgical procedures, medications, laboratory test results, and demographic information such as race-ethnicity, religion and language spoken. The Los Angeles Data Resource (LADR) system enables similar searches for patients within Los Angeles. Currently, it encompasses UCLA and Cedars-Sinai Health System, and within the next year we are aiming to add USC/Keck Medical Center, Children's Hospital of LA, City of Hope, the Los Angeles County Department of Health Services and others. This talk will review how each collaboration is structured and governed, results to date in supporting researchers, and expected future developments, including intersection with the National Patient Centered Clinical Research Network (PCORnet).

CS/ML Panel

Moderated by: Drs. Roby Khemani, Barry Markowitz, James Fackler, Randall Wetzel and Curtis Kennedy

Dinner and Poster Session

Day 2


Data Driven Analytics for Personalized Healthcare

Jianying Hu, PhD. IBM Thomas. J. Watson Research Center.

The concept of Learning Health Systems (LHS) is gaining momentum as more and more electronic healthcare data has become increasingly accessible. The core idea is to enable learning from the collective experience of a care delivery network as recorded in the observational data, to iteratively improve care quality as care is being provided in a real world setting. In line with this vision, the health care analytics research group at IBM Research has been developing machine learning, data mining and computer visualization methodologies that can be used to derive insights from diverse sources of healthcare data to provide personalized decision support for care delivery and care management. In this talk I will give an overview of a wide range of analytics components we have developed and some new directions we are taking.

Innovative Engineering of Clinical Decision Support in a Pediatric Intensive Care Unit

Curtis E. Kennedy, MD, PhD. Baylor College of Medicine.

The risk of death and disability in Pediatric Intensive Care Units (PICU) patients is orders of magnitude higher than it is in general inpatient care units. Currently, PICUs generate vast amounts of data containing meaningful signals. Noise in the data, however, presents a barrier to patient care. Decision support can provide bedside caregivers data that is reduced in noise and increased in meaningful signal by automatically filtering and screening for target patterns. To date, decision support for PICU patients has been difficult to implement due to the poor availability of real time data to analytic engines external to our commercial electronic medical record (EMR). Our PICU recently adopted a set of open source tools to overcome the EMR related barrier to clinical decision support. Our aim in this talk is to describe our implementation of these tools (collectively known as Invigilence) to drive decision support / quality improvement workflow cycles for two projects: Shock Index ( SI) notifications, and development of the Fluid Overload / Kidney Injury Score (FOKIS).


Learning and Mining in Large-scale Time Series Data

Yan Liu, PhD. University of Southern California. <

Many emerging applications of machine learning, such as healthcare, climate modeling, and computational biology, involve time series data with inherent structures. In this talk, I will discuss the machine learning models that we developed at USC Melady Lab to answer three most important questions in time series learning and mining, including temporal dependence learning, automatic feature learning, and multivariate time series retrieval. Experiment results, including the one on healthcare application, will be shown to demonstrate the effectiveness of our models.

Learning from Big Data.

Tyson Condie, PhD. University of California Los Angeles.

A new wave of systems is emerging in the space of Big Data Analytics that open the door to programming models beyond Hadoop MapReduce (HMR). It is well understood that HMR is not ideal for applications in the domain of machine learning and graph processing. This realization is fueling a number of new (Big Data) system efforts: Berkeley Spark, Google Pregel, GraphLab (CMU), and Hyracks (UC Irvine), to name a few. Each of these add unique capabilities, but form islands around key functionalities: fault-tolerance, resource allocation, and data caching. In this talk, I will provide an overview of some Big Data systems starting with Google's MapReduce, which defined the foundational architecture for processing large data sets. I will then identify a key limitation in this architecture; namely, its inability to efficiently support iterative workflows. I will then describe real-world examples of systems that aim to fill this computational void. I will conclude with a description of my own work on a layering that unifies key runtime functionalities (fault-tolerance, resource allocation, data caching, and more) for workflows (both iterative and acyclic) that process large data sets.


Comparing Repeated Measures Under Dynamic Treatment Regimes in the Analysis of Sequentially Randomized Trials

Xi Lu. University of Michigan.

The management of many health disorders, such as depression, diabetes, or obesity often entails a sequential, individualized approach whereby treatment is adapted over time in response to the evolving status and needs of the individual. This decision making can be guided by dynamic treatment regime (DTR), a sequence of individually-tailored decision rules that specify whether, how, or when to alter the intensity, dosage or type of treatment(s) at critical clinical decision points in the course of care. Sequential multiple assignment randomized trials (SMART) were developed explicitly to construct high-quality DTRs. This paper presents a methodology to analyze repeated measures outcomes using SMART data. The methodology can be used to compare mean trajectories among the DTRs embedded in a SMART. The main contribution of this work is the development of repeated measures modeling considerations, which take into account study design features that are unique to SMARTs. The overarching consideration is recognizing the timing of repeated measures in relation to the treatment stages in a SMART. An extension of a weighted-and-replicated (WR) regression estimator, easily implemented as GEE, is proposed to estimate the repeated measures marginal model. We illustrate the methodology first using data from an autism SMART study, and further via two other SMART trials with more complex study designs.

Interactive Session: Machine Learning and Big Data Panel

Profs. Tyson Condie, Jennifer Dy, Christian Shelton, and David Sontag


Hierarchical Conditional Random Fields for Outlier Detection: An Application to Detecting Epileptogenic Cortical Malformations.

Bilal Ahmed, MS. PhD Candidate, Department of Computer Science, Tufts University.

We cast the problem of detecting and isolating regions of abnormal cortical tissue in the MRIs of epilepsy patients in an image segmentation framework. Employing a multiscale approach we divide the surface images into segments of different sizes and then classify each segment as being an outlier, by comparing it to the same region across controls. The final classification is obtained by fusing the outlier probabilities obtained at multiple scales using a tree-structured hierarchical conditional random field (HCRF). The proposed method correctly detects abnormal regions in 90% of patients whose abnormality was detected via routine visual inspection of their clinical MRI. More importantly, it detects abnormalities in 80% of patients whose abnormality escaped visual inspection by expert radiologists.

Using Anchors to Predict Clinical State without Labeled Data.

David Sontag, PhD. New York University.

I will describe our initiative to create a new middleware layer for health information technology consisting of hundreds of clinical state variables that summarize a patient’s past and current state. These clinical state variables are continuously estimated throughout a patient's stay and can be used by decision support applications to better inform, guide, and expedite the workflows of clinicians. To enable this, we develop a new framework that relies on what we call "anchor variables”. By specifying anchor variables, an expert encodes a certain amount of domain knowledge about the problem, which is then used together with unsupervised learning to learn the rest of the model. To promote transferability between institutions, we assume only that these anchor variables (built using standardized medical ontologies) remain stable between institutions, while the rest of the underlying observation model can change. I'll describe our use of this framework to predict several clinical state variables relevant to the emergency department, and to learn disease progression models.

Automated Identification of Allergies Entered Using Non-Standard Terminology.

Richard H. Epstein, MD, CPHIMS. Sidney Kimmel Medical College at Thomas Jefferson University.

Maintenance of an active medication allergy list is one of the core measure requirements of electronic health record (EHR) systems, and is essential for safe healthcare. Identification of food allergies is also important because of the prevalence of severe reactions to many common products. Such lists are used to check prescribed medications for drugs or drug classes to which a patient may either be allergic or sensitive, and computerized decision support reduces the frequency of prescribing errors related to failure to recognize previously known allergies. The efficacy of such systems is dependent on their ability to recognize entered allergies and sensitivities, but when providers enter non-standard terms, the presence of a potential reaction may go unrecognized. Such modes of input are typically provided in free text fields, with many drug names misspelled. Even when drop lists are provided from which to select allergens, these lists may not reconcile against standard terminologies such as those found in publically available database such as RxNorm. In this talk a high-performing portable algorithm our group developed to match non-standard drug and food allergies to RxNorm will be described. This algorithm had accuracy, recall, and F-measure for drug allergies above 97% in a testing dataset, and above 93% for food allergies. The algorithm could easily be integrated in EHR systems to align entries in drop lists to RxNorm terms, as well as to perform real-time checking of manually entered allergy information to suggest corrections.


Ben Marlin and Randall Wetzel

Invited Speakers

PhD Candidate, Department of Computer Science, Tufts University.
Associate Research Scientist, Department of Biomedical Informatics, Columbia University.
Associate Professor, General Internal Medicine, University of California Los Angeles.
Assistant Professor. Computer Science Department, University of California Los Angeles.
Senior Systems Scientist. Auton Lab, Carnegie Mellon University.
Associate Professor. Department of Electrical and Computer Engineering, Northeastern University.
Professor and Vice Chair, Anesthesia Information Systems. Sidney Kimmel Medical College at Thomas Jefferson University.
Assistant Professor. Biomedical Informatics, School of Medicine, Vanderbilt University.
Assistant Professor Computer Science Department Viterbi School of Engineering University of Southern California
Assistant Professor. Computer Science Department, New York University.
Associate Professor. School of Public Health, Brown University.
Assistant Professor. Department of Anesthesiology, Vanderbilt University.

Senior Program Committee:

Tufts University, Department of Computer Science
Vanderbilt University, Department of Anesthesiology
UMass Amherst, School of Computer Science
NASA Jet Propulsion Laboratory and USC Computer Science
Vanderbilt University Medical Center
Children's Hospital Los Angeles

Need more information?

If you have any questions regarding the symposium, please send us an email.