SAVE THE DATE Machine Learning in Health Care

August 18th - 19th, 2017
Northeastern University, Boston, MA

Find Out More

Machine Learning in Health Care

MUCMD is being renamed as Machine Learning in Health Care (MLHC). MLHC is an annual research meeting that exists to bring together two usually insular disciplines: computer scientists with artificial intelligence, machine learning, and big data expertises, with clinicians , and medical researchers. MLHC supports the advancement of data analytics, knowledge discovery, and seriously meaningful use of complex medical data by fostering collaborations and the exchange of ideas between members of these too often completely separated communities. To this end, the symposium includes invited talks, poster presentations, panels, and ample time for thoughtful discussion and robust debate.

We are also pleased to announce that, for the first time, MLHC will be introducing a rigorous peer-review process and (optional) archival proceedings through the Journal of Machine Learning Research proceedings track.

Important Dates

  • Submission Deadline: Monday April 24th at 6pm (EDT)
  • Author Notification: Friday June 16th
  • Conference: Aug 18th - 19th, 2017

Call for Papers

Researchers in machine learning --- including those working in statistical natural language processing, computer vision and related sub-fields --- when coupled with seasoned clinicians can play an important role in turning complex medical data (e.g., individual patient health records, genomic data, data from wearable health monitors, online reviews of physicians, medical imagery, etc.) into actionable knowledge that ultimately improves patient care. For the last six years, MUCMD has drawn about 100 clinical and machine learning researchers to frame problems clinicians need solved and discuss machine learning solutions; this year we are introducing a rigorous review process which will include both computer scientists and clinicians. Accepted papers will be (optionally) archived through the Journal of Machine Learning Research proceedings track which is indexed through Pubmed.

We invite submissions that describe novel methods to address the challenges inherent to health-related data (e.g., sparsity, class imbalance, causality, temporal dynamics, multi-modal data). We also invite articles describing the application and evaluation of state-of-the-art machine learning approaches applied to health data in deployed systems. In particular, we seek high-quality submissions on the following topics:

  • Predicting individual patient outcomes
  • Patient risk stratification
  • Bio-marker discovery
  • Learning from sparse/missing/imbalanced data
  • Medical imaging
  • Clustering and phenotype discover
  • Feature selection/dimensionality reduction
  • Exploiting and generating ontologies
  • Text classification and mining for biomedical literature
  • Mining, processing and making sense of clinical notes
  • Parsing biomedical literature
  • Brain imaging technologies and related models
  • Time series analysis with medical applications
  • Efficient, scalable processing of clinical data
  • Methods for vitals monitoring
  • ML systems that assist with evidence-based medicine
  • Integration of clinical, omics, social media, and mobile sources
  • Public health and pharmaco-surveillance

Proceedings and Review Process. Accepted submissions will be published through the proceedings track of the Journal of Machine Learning Research. All papers will be rigorously peer-reviewed, and research that has been previously published elsewhere or is currently in submission may not be submitted to MLHC. However, authors will have the option of only archiving the abstract to allow for future submissions to clinical journals, etc.


The maximum paper length is 10 pages, excluding references, acknowledgements, and supplementary materials. The maximum size is 10 MB. We expect papers to be between 7-10 pages; shorter papers are acceptable as long as they fully describe the work.

Here is an example paper

LaTeX style files are available here

A Word template is available here

While section headings may be changed, the margins and author block must remain the same and all papers must be in 11-point Times font. If supplementary materials are included, the paper must still stand alone; reviewers are encouraged but not required to look at the supplementary materials.

Context for Clinicians: We realize that conferences in medicine tend to be abstract-only, non-archival events. This is not the case for MLHC: to be a premier health and machine learning venue, all papers submitted to MLHC will be rigorously peer-reviewed for scientific quality -- and for that a suitably complete description of the work is necessary. So we call for submissions of 7-10 pages that describe your problem, cohort, features used, methods, results, etc. Multiple reviewers will provide feedback on the submission. If accepted, you will have the opportunity to revise the paper before submitting the final version.

Context for Computer Scientists: MLHC is a machine learning conference, and we expect papers of the same level of quality as those that would be sent to a conference (rather than a workshop). One may choose to only have the abstract of the paper archived, but it is a violation of dual-submission policy to archive the full MLHC paper and then later submit the same paper to another conference

Regardless of whether or not the full paper is archived, authors of accepted papers will be invited to present a spotlight and/or a poster on their work at the conference.

(Of course, we hope that many papers have both clinicians and computer scientists involved!)


The example paper contains sample sections. A more machine-learning oriented paper may include more mathematical details, while a more application-focused paper may include more detailed cohort and study design descriptions. In all cases, papers should contain enough information for the readers to understand and reproduce the results.

Double-Blind Reviewing

Reviewing for MLHC is double-blind: the reviewers will not know the authors’ identity and the authors will not know the reviewers’ identity. Do not include your names, your institution’s name, or identifying information in the initial submission. Wait for the camera-ready. While you should make every effort to anonymize your work -- e.g. write “In Doe et al. (2011), the authors…” rather than “In our previous work (Doe et al., 2011), we…” -- we realize that a reviewer may be able to deduce the authors’ identities based on the previous publications or technical reports on the web. This will not be considered a violation of the double-blind reviewing policy on the author’s part.

Dual Submission and Archiving Policy

All submissions to MLHC must be novel work. You may not submit work that has been previously published, accepted for publication, or that has been submitted in parallel to other conferences. There are a few exceptions:

  1. You may submit a paper to MLHC and a journal at the same time.
  2. You may submit work that has only appeared at a conference or workshop without proceedings.
  3. You may submit work that has only been previously published as a technical report (e.g. on arXiv).

All submissions to MLHC must be full papers so that the work can be rigorously reviewed. Once your paper is accepted to MLHC, however, you may choose to only have the abstract archived to enable submission to a journal.


Day 1

Saban Research Institutte



Machine Learning Opportunities in the Explosion of Personalized Precision Medicine

Larry Smarr, PhD

We have reached the take off point in the generation of massive datasets from individuals and across populations, both of which are necessary for personalized precision medicine. I will give an example of my N=1 self-study, in which I have my human genome as well as multi-year time series of my gut microbiome genomics and over one hundred blood biomarkers. This is now being augmented with time series of my metabolome and immunome. These are then compared with hundreds of healthy people's gut microbiomes, revealing major shifts between health and disease. Multiple companies and organizations will soon be carrying out similar levels of analysis on hundreds of thousands of individuals. Machine learning techniques will be essential to bring the patterns out of these exponentially growing datasets

Machine learning that matters in healthcare: breaking down the silos

Leo Celi, MD

Quality of care, as would be reflected by the universal provision of standardized, evidence based and truly indicated care, has not improved to the degree one would have hoped. Similarly, while patient safety and medical errors have come into public awareness, advances in these areas have been slow, hard won, and unsupported by the kinds of smart, data driven engineering designs that have gone into other domains. The interest in applying machine learning to clinical practice is increasing yet the practical application of these techniques has been less than desirable. Clinicians continue to make determinations in a technically unsupported and unmonitored manner due to a lack of high-quality evidence or tools to support most day-to-day decisions. There is a persistent gap between the clinicians required to understand the context of the data and the engineers who are critical to extracting useable information from the increasing amount of healthcare data that is being generated. This talk focuses on the divide between the data science and healthcare silos, and posits that the lack of integration is the primary barrier to a data revolution in healthcare. I first discuss literature that supports the existence of this divide, and then I present recommendations on how to bridge the gap between practicing clinicians and data scientists.

Coffee Break and Discussion
Image-based Biomarkers and Prediction in Large Clinical Cohorts

Polina Golland, PhD

To take full advantage of clinically relevant information implicitly captured in medical images, we develop robust algorithms for quantifying disease burden from patient scans. We then demonstrate how genetic and clinical variables can be used to predict anatomy and anatomical change through a semi-parametric generative model. Joint modeling of image and genetic data promises to provide insights into genetic factors and anatomical effects of the disease. We demonstrate the promise of this approach on large collections of brain scans of different patient cohorts

Comprehensive predictive modeling at the bedside

Randall Moorman, MD

Early Warning Scores and other forms of predictive modeling present clinicians with real time estimates of the risks of imminent untoward events based on statistical models trained on legacy data sets. Nearly all such tools are based on static and intermittent data elements such as demographics, diagnoses, notes, vital sign measurements and lab test results. Continuous physiological monitoring such as EKG telemetry is another potential source, and has the potential advantage of higher data coverage. It introduces a new step in the modeling process, though, that of time series analysis of cardiorespiratory dynamics to detect signatures of illness. The University of Virginia group has investigated comprehensive approaches to predictive modeling that use static, intermittent and continuous data streams for early detection of subacute potentially catastrophic illness in infants and adults, in ICUs and on hospital floors.

Saban Research Patio

Spotlight Talks A

Posters A
Spotlight Talks B
Posters B
Improving the design and discovery of dynamic treatment strategies using reinforcement learning

Joelle Pineau, PhD

Reinforcement learning offers a powerful paradigm for automatically discovering and optimizing sequential treatments for chronic and life-threatening diseases. This talk will introduce basics of reinforcement learning and then discuss several aspects of this work, including: How should we collect data to learn good sequential treatment strategies? How can we learn a representation of the data that allows generalization across patients? How can we use the data collected to discover sequential treatment strategies that are tailored to patient characteristics and time-dependent outcomes? The methods presented will be illustrated using results of our work on learning adaptive neurostimulation policies for the treatment of epilepsy.

Saban Research Patio

Dinner and Discussion

Day 2

Saban Research Institutte


Processed data to derive clinically useful information

Michael Pinsky, MD and Artur Dubrawski, PhD

It is often difficult to accurately predict who, when, and why patients would develop shock because signs of shock often occur late when organ injury is already present. Three levels of aggregation of information can be used to aid the bedside clinician in this task: analysis of derived parameters of existing measured physiologic variables using simple bedside calculations (Functional Hemodynamic Monitoring), using prior physiologic data of similar subjects during periods of stability and disease to define quantitative metrics of level of severity; and to use libraries of responses across large and comprehensive collections of records of diverse subjects whose diagnosis, therapies and course of treatment is already known to predict not only disease severity, but also the subsequent behavior of the subject if left untreated or treated with one of the many therapeutic options. A major pre-analysis problem is the cleaning of data to remove non-physiologic artifacts due to technical errors, which correspond to >70% of all clinical alerts. We have been developing algorithms that effectively isolate ~85% of all artifacts among alerts generated from physiologic time series of vital sign data. The next problem is to define the minimal monitoring data set needed to initially identify patients at risk across all possible processes and then specifically monitor their response to targeted therapies known to improve outcomes. To address these issues, we represented the vital sign data with highly multivariate feature sets and used machine learning algorithms to infer parsimonious predictive models for cardiorespiratory insufficiency. We describe the nature of the required data sets and modeling approaches used to detect, forecast, and track evolution of risk for this severe condition. These approaches jointly enable earlier identification of cardiorespiratory insufficiency and direct focused patient-specific management. To validate our methodology, we used both a porcine model of hemorrhage and human vital sign data collected in a trauma step-down unit. Our results show value of truly multivariate fused approach versus more traditional single vital sign thresholding at detection, and how it can also allow for reliable forecasting of cardiorespiratory insufficiency before its overt signs become apparent. Also, increasing resolution of signal processing from mean data collected at regular intervals to beat-to-beat and waveform analysis progressively improves the predictive value of the fused parameters. In addition, we show that using personalized reference data can further improve detectability and predictability of cardio-respiratory insufficiency, if such data is available. Finally, we demonstrate that temporal evolution of risk for cardiorespiratory insufficiency is a heterogeneous yet a systematic process. Most patients who develop this condition follow one of only a handful typical risk evolution trajectories, and they can be assigned to their most likely trajectory type well ahead of the onset, therefore enabling further gains in predictability.

Clinical Abstract Talks and Software Demos
Clinical Abstract Posters
Culture Trumps Data

Bassam Kadry, MD

Why is it so hard to drive change in healthcare? The data, technology, and insights exist but despite this it is so hard to move the needle in the right direction. What's the point of developing a technology if it is never going to be used due to business, cultural, or human behavior challenges. Understanding these issues can help you have greater impact. Learn how to ask the right questions that will yield the greatest impact.

Panel Discussion

  • Randall Wetzel
  • Lee Hartsell
  • Suchi Saria
  • John Guttag
  • Nigam Shah

Electronic Health Record Analysis via Deep Poisson Factor Models

Lawrence Carin, PhD

Electronic Health Record (EHR) phenotyping utilizes patient data captured through normal medical practice, to identify features that may represent computational medical phenotypes. These features may be used to identify at-risk patients and improve prediction of patient morbidity and mortality. We present a novel deep multi-modality architecture for EHR analysis (applicable to joint analysis of multiple forms of EHR data), based on Poisson Factor Analysis (PFA) modules. Each modality, composed of observed counts, is represented as a Poisson distribution, parameterized in terms of hidden binary units. Information from different modalities is shared via a deep hierarchy of common hidden units. To explore the utility of these models, we apply them to a subset of patients from the Duke-Durham patient cohort. We identified a cohort of over 12,000 patients with Type 2 Diabetes Mellitus (T2DM) based on diagnosis codes and laboratory tests out of our patient population of over 240,000. Examining the common hidden units uniting the PFA modules, we identify patient features that represent medical concepts. Experiments indicate that our learned features are better able to predict mortality and morbidity than clinical features identified previously in a large-scale clinical trial.

A perspective on Machine Learning in Pediatric Intensive Care

The Laura P. and Leland K. Whittier Virtual PICU

The Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU) is a team of doctors, machine learners, and engineers committed to developing real-time clinical decision support for the pediatric ICU. We will discuss our perspective on what needs exist in the ICU and how machine learning can meet these needs. We will highlight some of our recent machine learning work that aims to enable solutions to those needs.

Closing Remarks
Feedback Discussion Session

Invited Speakers

Clinical Assitant Professor, Anesthesiology, Perioperative and Pain Medicine Stanford Medicine
Professor of Computer Science and Information Technologies University of California, San Diego
Professor of Electrical & Computer Engineering Duke University
Professor of Electrical Engineering and Computer Science Massachusetts Institute of Technology
Associate Professor, School of Computer Science McGill University
Professor of Medicine, Biomedical Engineering and Molecular Physiology and Biological Physics University of Virgina
Assistant Professor Medicine Beth Israel Deaconess Medical Center
Professor of Critical Care Medicine University of Pittsburgh
Senior Systems Scientist, Robotics Institute Carnegie Mellon University

Program Chairs

Assistant Professor in Computer Science, Harvard School of Engineering and Applied Sciences
Associate Professor Departments of Anesthesiology/Critical Care Medicine and Pediatrics Johns Hopkins University School of Medicine
PhD Student, Computer Science, Viterbi Dean's Doctoral Fellow, and Alfred E. Mann Innovation in Engineering Fellow at the University of Southern California
Assistant professor at the University of Texas at Austin
Assistant Professor of Computer Science and Engineering (CSE) at the University of Michigan

Senior Advisory Committee:

Dean of the College of Computer and Information Science, Northeastern University
Associate Professor and Canada Research Chair in Computational Biology, University of Toronto
Associate Professor, Biomedical Informatics Emory University
Associate Professor of Biomedical Informatics, Affiliated with Computer Science, Columbia University
Professor of Computer Science at Cornell Tech in New York City and a Professor of Public Health at Weill Cornell Medical College
Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at The University of Texas at Austin
Professor of Computer Science at the University of Alberta
Dugald C. Jackson Professor MIT Department of Electrical Engineering and Computer Science
Professor of Computer Science, University of Pittsburgh
Technical Fellow and Managing Director, Microsoft Research
Lawrence J. Henderson Professor of Pediatrics, Boston Childrens Hospital
HST Faculty, Distinguished Professor in Health Sciences and Technology and Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Professor of Medicine, Biomedical Engineering and Molecular Physiology and Biological Physics
Professor of Computer Science at the University of British Columbia
Senior Lecturer in Computer Science at Makerere University
Associate Professor at UC Riverside's Computer Science Department
Professor of Computer Science and Engineering in the MIT Department of Electrical Engineering and Computer Science
Associate Professor, Medicine - Biomedical Informatics Research, Stanford University
Founder’s Board Chair of Neurocritical Care, Professor in Pediatrics-Neurology, Neurology - Ken and Ruth Davee Department and Pharmacology, Northwestern
Chairman, Department of Anesthesiology Critical Care Medicine - Children's Hospital Los Angeles
Professor of Machine Learning, School of Informatics, University of Edinburgh

Accepted Papers

Input-Output Non-Linear Dynamical Systems applied to Physiological Condition Monitoring
Konstantinos Georgatzis, Chris Williams, and Christopher Hawthorne, University of Edinburgh
Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data
Truyen Tran, Wei Luo, and Dinh Phung, Deakin University; Jonathan Morris and Kristen Rickard, University of Sydney; Svetha Venkatesh, Deakin University
Mitochondria-based Renal Cell Carcinoma Subtyping: Learning from Deep vs. Flat Feature Representations
Peter Schüffler and Judy Sarungbam, Memorial Sloan Kettering Cancer Center; Hassan Muhammad, Weill Cornell Medical College; Ed Reznik, Satish Tickoo, and Thomas Fuchs, Memorial Sloan Kettering Cancer Center
Multi-task Learning with Weak Class Labels: Leveraging iEEG to Detect Cortical Lesions in Cryptogenic Epilepsy
Bilal Ahmed, Tufts; Thomas Thesen and Karen Blackmon, NYU; Carla Brodley, Northeastern
Doctor AI: Predicting Clinical Events via Recurrent Neural Networks
Edward Choi and Mohammad Taha Bahadori, Georgia Tech; Andy Schuetz and Walter Stewart, Sutter Health; Jimeng Sun, Georgia Tech
Diagnostic Prediction Using Discomfort Drawing with IBTM
Cheng Zhang, KTH Royal Institute of Technology; Hedvig Kjellström, KTH Sweden; Carl Henrik Henrik, Bristol University; Bo Bertilson, KI Karolinska Institutet
Learning Robust Features using Deep Learning for Automatic Seizure Detection
Pierre Thodoroff and Joelle Pineau, McGill University
Using Kernel Methods and Model Selection for Prediction of Preterm Birth
Ansaf Salleb-Aouissi, Columbia University; Anita Raja, Cooper Union; Ronald Wapner, Columbia Medical Center
gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity
Rhiannon Rose and Daniel Lizotte, Western University
Uncovering Voice Misuse Using Symbolic Mismatch
Marzyeh Ghassemi, MIT; Zeeshan Syed, University of Michigan; Daryush Mehta, Jarrad Van Stan, and Robert Hillman, Masschussetts General; John Guttag, MIT
Identifiable Phenotyping using Constrained Non-Negative Matrix Factorization
Shalmali Joshi, Suriya Gunasekar, and Joydeep Ghosh, UT Austin; David Sontag, NYU
Transferring Knowledge from Text to Predict Disease Onset
Yun Liu, MIT; Kun-Ta Chuang, Fu-Wen Liang, and Huey-Jen Su, National Cheng Kung University; Collin Stultz and John Guttag, MIT
Scalable Modeling of Multivariate Longitudinal Data for Prediction of Chronic Kidney Disease Progression
Joseph Futoma, Blake Cameron, Mark Sendak, and Katherine Heller, Duke University
Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series
Zachary Lipton, UC San Diego; David Kale, USC Information Sciences Institute; Randall Wetzel, Children's Hospital LA
Deep Survival Analysis
Rajesh Ranganath, Princeton University; Adler Perotte, Noémie Elhadad, and David Blei, Columbia University
Deep Convolutional Neural Networks for Microscopy-Based Point of Care Diagnostics
Alfred Adama, Pius Mugagga, Rose Nakasi, and John Quinn, Makerere University
Clinical Tagging with Joint Probabilistic Models
Yoni Halpern, NYU; Steven Horng, Beth Israel Deaconess Medical Center; David Sontag, NYU
Multi-task Prediction of Disease Onsets from Longitudinal Laboratory Tests
Narges Razavian, Jake Marcus, and David Sontag, NYU
A Non-parametric Bayesian Approach for Estimating Treatment-Response Curves from Sparse Time Series
Yanbo Xu, Suchi Saria, and Yanxun Xu, Johns Hopkins University

Accepted Clinical Podium Abstracts

Demonstration of a Chronic Kidney Disease Population Rounding Tool
Mark Sendak, Duke Institute for Health Innovation; Faraz Yashar, Lance Co Ting Keh, Ephori LLC; Blake Cameron, Joseph Futoma, Katherine Heller, and Uptal Patel, Duke
Precision Medicine in Point-of-Care Management of Surgical Complications
Zhifei Sun, Elizabeth Lorenzi, Ouwen Huang, Thomas Li, Christopher Mantyh, Katherine Heller, and Erich Huang, Duke
Performing an informatics consult
Nigam Shah, Stanford Center for Biomedical Informatics Research
MS Mosaic: Mobile technology and machine learning for multiple sclerosis research and patient care
Lee Hartsell and Katherine Heller, Duke University
Care Coordination using practice based evidences
Adrish Sannyasi, Splunk; Daniella Meeker, USC Keck School of Medicine
Same Decision Probability in Neurocritical Care
Fabien Scalzo, Arthur Choi, and Adnan Darwiche, UCLA
Patient Identification Using Plethysmography Structure Analysis
Jennifer Laine, Yale University
Real-time Detection and Exploratory Discovery of Anomalies for Pediatric Ventilator Management
Tanachat Nilanon and Yan Liu, USC; Justin Hotz and Robinder Khemani, Children's Hospital LA



Need more information?

If you have any questions regarding the symposium, please send us an email.