2015 Program

Day 1



Curtis E. Kennedy, MD, PhD


Clinical Alerting of Unusual Care that Is Based on Machine Learning from Past EMR Data

Gregory Cooper, MD, PhD

Medical errors remain a significant problem in healthcare. Electronic medical records (EMRs) have shown great promise in helping health care providers to identify and reduce medical errors. Computer-based monitoring and alerting systems play a key role in this effort. We have developed a method for alerting that is based on machine learning from EMRs. In particular, the method uses data in an EMR system to learn a probabilistic model of the usual care of past patients. For a current patient, it derives the probability of each clinical care action that the patient has recently received (e.g., the administration of a given medication). A care action that has a very low probability will trigger a clinical alert that the action is anomalous. We hypothesize that anomalous actions correspond to medical errors often enough to make such alerting worthwhile. This approach has the advantage that it provides broad coverage of clinical care, is completely data driven, can readily adapt to new clinical environments and locations, and can be continually updated over time. This talk discusses the implementation and laboratory evaluation of a version of this system that sends alerts on patients in the intensive care unit (ICU). The results support that this approach is promising.

Distributed healthcare data networks for large scale comparative safety and effectiveness research of medical products

Sengwee Darren Toh, DS

Distributed network architecture and data analytics have in many ways revolutionized the conduct of multi-center, population-based research of medical products. This presentation will focus on the FDA Sentinel System, a new national active surveillance system that monitors the safety of approved medical products. We will describe the genesis of the system, the creation and expansion of a distributed data network of 18 electronic healthcare databases, and the development and application of distributed analytic methods that perform robust statistical analysis without sharing identifiable patient-level information. We will also talk about other relevant national initiatives, including the PCORnet-The National Patient-Centered Clinical Research Network.


Computational Phenotyping using Knowledge Guided Tensor Factorization and Completion

Jimeng Sun, PhD

Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data. We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing. We evaluate Rubik on two EHR datasets, one of which contains 647,118 records for 7,744 patients from an outpatient clinic, the other of which is a public dataset containing 1,018,614 CMS claims records for 472,645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.

Extracting Multi-word, Entity-specific Topics and their Interrelations from Online Medical Forums

Sam Wiseman, Andrew Miller, Finale Doshi-Velez, Stuart Shieber

We present a model that jointly learns phenotype, treatment, and background-specific multi-word topics as well as a document-level clustering induced by a set of learned "prototypes." We apply this model to health-related posts taken from online ASD forums.


Using PageRank to Detect Anomalies and Fraud in Healthcare

Ofer Mendelevitch, MSc

Anomaly detection in healthcare data is an enabling technology for the detection of overpayment and fraud. In this talk, we demonstrate how to use PageRank with Hadoop and SociaLite (a distributed query language for large-scale graph analysis) to identify anomalies in healthcare payment information. We demonstrate a variant of PageRank applied to graph data generated from the Medicare-B dataset for anomaly detection, and show real anomalies discovered in the dataset.

Causal Inference with Uncertain Data

Samantha Kleinberg, PhD

Massive amounts of medical data from electronic health records and body-worn sensors are being mined by researchers, and could be used to guide the decision-making of both doctors and patients. These time series can potentially let us discover factors affecting the recovery of patients in intensive care, or hypoglycemic episodes during daily life. However, patients are highly heterogeneous, the data are noisy, and we lack ground truth for evaluating algorithms. In this talk, I discuss methods for finding causal relationships from these difficult data sources, and why simulations are needed for evaluating our algorithms.


Controversies and Weaknesses of Comparative Effective Research Using Multi Center Databases

Punkaj Gupta, MD
Mallikarjuna Rettiganti, PhD

The use of databases of routinely collected healthcare information in epidemiology has expanded in the last decade as awareness has increased and more and larger resources have become available. However, health care lags behind other industries in leveraging advances in information technology and analytical techniques. With increasingly more data being digitally collected in every health care encounter, prospects improve for the integration of clinical care and research. Routinely collected data “provide great potential for extracting useful knowledge to achieve the ‘triple aim’ in health care— better care for individuals, better care for all, and greater value for dollars spent”. However, database research has its own challenges. Studies from databases are performed within the limitations of a resource not specifically designed to test the research hypothesis but is the product of complex and evolving healthcare systems. As a result, there is skepticism in using multi center databases for conducting comparative effectiveness research (CER). Retrospective databases pose a series of methodological challenges, some of which are unique to each data source. There are concerns on the quality of the data collected in administrative databases, while there are concerns on the availability of resource utilization data in clinical databases. In addition, CER using databases are associated with selection bias, residual confounding, and measurement error. The purpose of this project will be to help build trust in the power of big data for CER to serve the public good. To address some of these knowledge gaps, we plan to demonstrate the following examples: Example 1: Variation in Outcomes of Cardiac Arrest for Operations of Varying Complexity in Children Undergoing Heart Surgery: Single center linkage of two National Databases. Example 2: Association Between Extracorporeal Membrane Oxygenation Center Volume and Mortality Among Children With Heart Disease: Propensity and Risk Modeling. Example 3: Impact of Varied Center Volume Categories on Volume-Outcome Relationship in Children receiving Extracorporeal Membrane Oxygenation for Heart Disease.

CS/ML Panel

Moderated by: Dave Kale

This interactive session will put four experts on the spot to discuss how recent advances in computational sciences can impact clinical research and delivery of care and contribute toward the development of a learning health care system. Questions and criticisms from attendees, especially clinicians and clinical researchers, are encouraged. Discussion will focus on how observational clinical data pose major challenges that differ from data from controlled trials and other domains (e.g., vision, NLP, advertising). We will also hear how each panelist would attempt to solve a very difficult hypothetical research problem.

Machine Learning and Cloud Computing

Dave Cuthbert

This brief talk goes over the scaling challenges which lead to the development of cloud computing at Amazon. We'll discuss how researchers are using this to get rid of "undifferentiated heavy lifting" -- tasks that are not core to their research -- to get results faster and what this looks like for machine learning in both HIPAA and non-HIPAA environments.

Dinner and Poster Session

Day 2


Patient-Specific Survival Prediction

Russell Greiner, PhD

An accurate model of a patient's individual survival distribution can help determine the appropriate treatment and care of terminal patients. The common practice of estimating such survival distributions uses only population averages for (say) the site and stage of cancer. However, this is not very precise, as it ignores many important individual differences among patients. This presentation describes a novel technique, PSSP (patient-specific survival prediction), that learns a model from earlier patients, which can then be used to produce an individual survival curve, based on the characteristics of a specific patient. We describe how PSSP works, and explain how PSSP differs from the more standard tools for survival analysis (Kaplan-Meier, Cox Proportional Hazard, prognostic scores, etc). We also show that PSSP is "calibrated", which means that its probabilistic estimates are meaningful. Finally, we demonstrate, over many real-world datasets (including various cancers), that PSSP provides survival estimates that are helpful for patients, clinicians and researchers. This tool is freely available at http://pssp.srv.ualberta.ca/ .


Innovation and Intelligence: A Necessary Convolution for the Future of Intelligence-based Biomedicine and Healthcare

Anthony C Chang, MD

There is an exponential convergence of the myriad of innovations in areas in biomedicine such as genomic medicine, wearable technology, and advanced medical imaging. All of these advances demand faster and better handling of data and more sophisticated "intelligence" with predictive analytics, cognitive computing, and deep learning. This new era of "biomedical" intelligence (or "intelligence-based medicine and health care") will require a continual computer-brain synergy and data scientist-clinician collaboration.


Real World Data and Clinical Decision Support

Martin S. Kohn, MD

Achieving the dual goal of improving health outcomes and controlling cost will require making better decisions for the individual. Relying on traditional randomized control studies will be insufficient to promote personalized healthcare. Analysis of real world data, big data, in all its forms, will be necessary. Reducing avoidable healthcare encounters, in which home monitoring has a role, is one improvement strategy.

Model Based Probabilistic Inference for Intensive Care

Yusuf B. Erol, Romi Phadte, Harsimran Sammy Sidhu, Claire

Modern intensive care units (ICUs) utilize a multitude of instrumentation devices to provide measurements of various important physiological variables and parameters. While data are valuable, understanding the data and acting upon them is what yields the benefits in terms of improved health outcomes. Due to the uncertainty in our knowledge of patient physiology and the partial, noisy/artifactual nature of the observations we adopt a probabilistic, model-based approach. The core of this approach involves calculating a posterior probability distribution over a set of unobserved state variables given a stream of data and a probabilistic model of patient physiology and sensor dynamics. The probability estimate of the state, includes various physiological and pathophysiological variables which provides a diagnosis on which the nurse or the physician can act. The proposed approach is also capable of detecting artifacts, sensor failures, drug maladministration and other various problems in the ICU setting. The overarching goals of the proposed approach are estimating the current health state of the patient, projecting the future health state, and synthesizing possible intervention plans.

Understanding Ventilation from Multi-Variate ICU Time Series

Marzyeh Ghassemi, Marco A.F. Pimentel, Mengling Feng, Finale Doshi, Leo Celi, Peter Szolovits

In this work, we identify differences in the multidimensional physiological time series of patients in the pre-ventilation (V-), post-ventilation (V+) and non-ventilated (C) intensive care unit (ICU) population. Ventilation is common in the ICU, but small changes in the timing and setting of the ventilation can make large differences in outcome. We approach this problem from the perspective of statistical language modeling, and examine the ability of symbolized hourly ”words” to represent patient trajectories in the ventilated and control populations. We found that aggregate relationships between the groups varied depending on the variable examined. Statistical language models were better able to predict sequences of symbols with the V+ and C groups in the tri-gram (3 hour) setting, with 68.31% and 67.65% tri-gram hit-rates respectively. The V- group was generaly harder to predict, which may reflect an underlying, and ultimately predictive, latent state detectable from the physiological time series.


Challenges in Machine Learning from Electronic Health Records

C. David Page Jr.

Society is interested in predicting health events, and the majority of people in the U.S. now have much of our relevant data for this goal in electronic health records (EHRs). This talk will discuss challenges and lessons from applying supervised machine learning to EHR data for a variety of prediction tasks. Machine learning topics in the talk include continuous-time Bayes nets and point process models, automated labeling of examples, relational learning, and differentially-private learning algorithms.

Leveraging In-Silico Physiology to Empower Analytics for Informing Care

Dimitar Baronov

Machine learning has been established as the dominant technique for the analysis of clinical data today. While this approach has demonstrated some success, it’s efficacy and broad application is hindered by its need for massive amounts of data to avoid overtraining, its “black box” architecture which restricts insight into data interactions, and the challenges of scaling from problem to problem. We present a novel monitoring technology that leverages in-silico physiology models to interpret vast amounts of real-time critical care data from multiple sources to extract actionable clinical information. We will describe a model-based risk assessment approach built on a framework that is first, grounded in established first principle mechanistic models of the human body that can encapsulate existing and new medical knowledge, second, scalable to include and adapt to all available patient data sources, and third, transparent in that it can be continually refined to reveal and stratify pathologies and their etiologies in facilitating an evolving understanding of disease pathways and treatment effects. We will present the results of applying this approach to the hemodynamic monitoring of infants immediately following cardiac surgery and its demonstrated efficacy in estimating the probability of inadequate oxygen delivery, which is a fundamental risk attribute in the management of critically ill patients.


Invited Speakers

Chief Intelligence and Innovation Officer, CHOC Children's
Founder and CTO of Etiometry Inc.
Vice Chair, Department of Biomedical Informatics University of Pittsburgh
Scientific Director of the Alberta Innovates Centre for Machine Learning
Assistant Professor of Pediatrics in Pediatric Cardiology for the University of Arkansas for Medical Sciences (UAMS) at Arkansas Children's Hospital
Assistant Professor in the Computer Science Department at Stevens Institute of Technology
Chief Medical Scientist at Sentrian
Director, Data Science at Hortonworks
Professor Department of Biostatistics and Medical Informatics and Department of Computer Sciences School of Medicine and Public Health University of Wisconsin-Madison
Associate Professor of School of Computational Science and Engineering at College of Computing at Georgia Institute of Technology
Associate Professor in the Department of Population Medicine at Harvard Medical School and Harvard Pilgrim Health Care Institute

Senior Program Committee:

Associate Professor at UC Riverside's Computer Science Department
Assistant Professor of Pediatrics, Section of Critical Care Medicine - Baylor College of Medicine
Chairman, Department of Anesthesiology Critical Care Medicine - Children's Hospital Los Angeles

Need more information?

If you have any questions regarding the symposium, please send us an email.