|Friday, Saban Auditorium|
|8:30||Welcome and Introduction||Randall Wetzel
Children's Hospital LA, Whittier VPICU
Johns Hopkins, Departments of Computer Science and Health Policy
|8:45||Integrating Data for Analysis, Anonymization and Sharing (iDASH): New Models of Sharing Clinical and Laboratory Data||Lucila Ohno Machado
UC San Diego, Division of Biomedical Informatics
|9:20||Removing Confounding Factors via Constraint-Based Clustering: An Application to Finding Homogeneous Groups of MS Patients||Carla Brodley
Tufts University, Department of Computer Science
|9:55||The Importance of Workflows in Big Data Research||Yolanda Gil
USC Information Sciences Institute
|10:45||Interactive Session 1: Problems of Interest in Clinical Settings
led by Randall Wetzel
|12:45||Interactive Session 2: Active Collaborative Problem Brainstorming
led by Suchi Saria
|13:45||Crowdsourcing and Machine Learning||Gert Lanckriet
UC San Diego Department of Electrical and Computer Engineering
|14:20||SMART Platform for Enabling Apps on EMR Data||Kenneth Mandl
Children's Hospital Boston
|14:55||Sequential Multiple Assignment Randomized Trials and Treatment Policies||Susan Murphy
University of Michigan, Department of Statistics
|15:45||Core Research Questions for Natural Language Processing of Clinical Text||Noemie Elhadad
Columbia University, Department of Biomedical Informatics
|Supporting Preference-aware Sequential Medical Decision Making||Dan Lizotte
University of Waterloo
|Hexadecimal to Hospitals: How Doctors Think, and How to Use That to Translate Between Medicine and Technology||Katherine Homan
Albany Medical Center
|18:00||Poster Session with Dinner and Drinks|
|Saturday, Saban Auditorium|
|9:00||Running the Big Machine: Using Real Time EMR Data in High Risk, High Reward Environments||Warren Sandberg
Vanderbilt University Medical Center
|9:35||Predictive Modeling in Intensive Care||Pete Szolovits
Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Lab
|10:10||The Privacy-Usefulness Tradeoff||Katrina Ligett
California Institute of Technology, Computer Science and Economics
|11:00||Interactive Session 3: Multidisciplinary Collaborations
led by Hector Corrada Bravo, University of Maryland, Department of Computer Science
|13:00||Emotion Tracking for Memory, Health, and Awareness||Mary Czerwinski
|13:35||Modeling and Prediction with ICU Electronic Health Records Data||Benjamin Marlin
University of Massachusetts, Department of Computer Science
|14:10||The Life of a Hadoop Cluster (Or, Whatever I Want to Talk About Today)||Josh Wills
|Bioinformatics Approaches in Pediatric Critical Care: Proof of Principle||Mark Wainwright
|Integrated Predictive Modeling of Variable-Resolution Healthcare Data||Joydeep Ghosh
|Exploiting the Value of F1 Telemetry for Pediatric Intensive Care||Heather Duncan and Peter van Manen
|16:45||It's Not the Size but What You Do with It: "Tiny Data" and Business Value in a "Perverse" Market||Josh Rosenthal
|17:15||Interactive Session 4: Meaningful Business Use of Clinical Data
led by Josh Rosenthal
|18:15||Clinical Data Hack-a-thon Begins!
Evening Reception (drinks and food provided)
|Sunday (final schedule TBD)|
|Time||Saban, room TBD||Saban, room TBD|
|all day||Clinical Data Hack-a-thon continues
Self-organized break-out sessions and collaborative project planning meetings
Breakfast and snacks provided
|9:00||Meaningful Analysis of EHR Data Break-out Session
hosted by Practice Fusion
|10:00||Interactive Visualization and Analysis
led by Diana Maclean, Stanford University
|11:00||Time Series Modeling and Data Mining
led by Josh Patterson, Principal Solutions Architect, Cloudera
|12:00||Clinical Data Hack-a-thon Presentations|
Organizing CommitteeRandall Wetzel, M.D., Children's Hospital LA
John Langford, Ph.D., Microsoft Research
Suchi Saria, Ph.D., Johns Hopkins Computer Science and Public Health
Programming CommitteeRandall Wetzel, M.D., Children's Hospital LA
Stuart Russell, Ph.D., UC Berkeley Computer Science and UC San Francisco Neurosurgery
John Langford, Ph.D., Microsoft Research
Alina Beygelzimer, Ph.D., IBM T.J. Watson Research Center
Suchi Saria, Ph.D., Johns Hopkins Computer Science and Public Health
Dan Crichton, M.S., NASA Jet Propulsion Laboratory
Chris Mattmann, Ph.D., NASA Jet Propulsion Laboratory and USC Computer Science
David Kale, M.S., Children's Hospital LA and USC Computer Science
Local Organizing Committee
Talk Abstracts and Speaker Bios(organized by appearance in program)
Friday, August 10, 2012Talks in Saban Auditorium. Coffee breaks, lunch, posters, and dinner in Saban Courtyard.
8:30 - Welcome and Introduction
Chairman, Department of Anesthesiology Critical Care Medicine
The Anne O'M Wilson Professor of Critical Care Medicine
Childrens Hospital Los Angeles
Professor of Pediatrics and Anesthesiology
USC Keck School of Medicine
Director, The Laura P. and Leland K. Whittier Virtual PICU
Assistant Professor, Computer Science, Whiting School of Engineering
Assistant Professor, Health Policy, Bloomberg School of Public Health
Johns Hopkins University
8:45 - Integrating Data for Analysis, Anonymization and Sharing (iDASH): New Models of Sharing Clinical and Laboratory Data
iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data- sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.
Lucila Ohno-Machado, Ph.D.
Professor of Medicine
Founding Chief, Division of Biomedical Informatics
Associate Dean for Informatics and Technology
Division of Biomedical Informatics
University of Computer Science, San Diego
I direct the Division of Biomedical Informatics (DBMI) at UCSD, a research, teaching, and clinical support service unit. My research has been focused on construction and evaluation of data mining and decision support tools for biomedical research and clinical care. These tools are based in statistical and machine learning on large biomedical datasets. We develop tools to make clinical data available for research without compromising patient privacy, and to integrate and analyze massive amounts of data efficiently. UCSD is home to a National Center for Biomedical Computing on integrating Data for Anonymization, Analysis, and SHaring (iDASH), for which we developed a private cloud to deal with personal health identified data. We also direct the Scalable National Network for Effectiveness Research. Additionally, our division directs Informatics Core of the CTSA at UCSD. We have developed the UCSD Clinical Data Warehouse for research, implemented commercial and non-commercial data management systems for clinical studies. I chair the steering committee for UC-Research exchange, a University of California-wide initiative to integrate data warehouses from their five medical centers. As associate dean for informatics at UCSD, I oversee the development and implementation of information systems for clinical quality improvement and health services research.
9:20 - Removing Confounding Factors via Constraint-Based Clustering: An Application to Finding Homogeneous Groups of MS Patients
Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical data sets, age is often a confounding factor in tests designed to judge the severity of a patient's disease through measures of mobility, eyesight and hearing. In such cases, removing age from each instance will not remove its affect from the data as other features will be correlated with age. We present a method based on constraint-based clustering to remove the impact of such confounding factors and compare it to the standard approach of detrending. Motivated by the need to find homogenous groups of MS patients, we apply our approach to remove physician subjectivity from patient data. The result is a promising novel grouping of patients that can help uncover the factors that impact disease progression in MS.
Carla Brodley, Ph.D.
Chair, Department of Computer Science
Professor, Department of Computer Science
Carla E. Brodley is a professor and Chair of the Department of Computer Science at Tufts University. She received her PhD in computer science from the University of Massachusetts, at Amherst in 1994. From 1994-2004, she was on the faculty of the School of Electrical Engineering at Purdue University, West Lafayette, Indiana. She joined the faculty at Tufts in 2004. Professor Brodley's research interests include machine learning, knowledge discovery in databases, health IT, and personalized medicine. She has worked in the areas of intrusion detection, anomaly detection, classifier formation, unsupervised learning and applications of machine learning to remote sensing, computer security, neuroscience, digital libraries, astrophysics, content-based image retrieval of medical images, computational biology, chemistry, evidence-based medicine, and personalized medicine. She is on the editorial boards of JMLR, Machine Learning and DKMD. She is a member of the AAAI Council, a member of the board of directors for the Computing Research Association (CRA), a board member of CRA-W, and a board member of the IMLS.
9:55 - The Importance of Workflows in Big Data Research
Big data goes beyond large volume, and encompasses data that is diverse along a vast number of dimensions. A recent editorial in Science reported that the majority of research labs already lack the expertise required to analyze their data. Researchers often resort to forming teams of collaborators that have complementary expertise. This approach to big data analytics is rapidly becoming extremely time consuming and in most cases impractical, and will not scale as the complexity and variety of the data continues to increase. These big data challenges have created great opportunities for artificial intelligence to make data analytic processes easier to share and more efficient to execute, lowering the barriers of the complexity of the problems that can be tackled. I will describe our ongoing research on intelligent workflow systems that assist users with complex data analysis problems.
Yolanda Gil, Ph.D.
Principal Investigator and Project Leader, Interactive Knowledge Capture Group, USC Information Sciences Institute
Associate Division Director for Research, Intelligent Systems Division, USC ISI
Research Professor, Department of Computer Science, USC
Dr. Yolanda Gil is Director of Knowledge Technologies and Associate Division Director at the Information Sciences Institute of the University of Southern California, and Research Professor in the Computer Science Department. She received her M.S. and Ph. D. degrees in Computer Science from Carnegie Mellon University. Her research interests include intelligent user interfaces, knowledge-rich problem solving, and the semantic web. An area of recent emphasis is collaborative large-scale data analysis through semantic workflows. She recently led the W3C Provenance Group that charted a community standardization effort in this area. Dr. Gil has served in the Advisory Committee of the Computer Science and Engineering Directorate of the National Science Foundation. She was elected Chair of ACM SIGART, the Association for Computing Machinery's Special Interest Group on Artificial Intelligence. She is a Fellow of the American Association of Artificial Intelligence (AAAI).
The wealth of clinical data in electronic health care records (EHRs) holds no end of fascinating problems and potential solutions for researchers and innovators from other disciplines, but which of these has the potential to impact real-world care that patients receive and to improve outcomes? The first step in deriving meaning and value from digital clinical data is to identify the key problems of interest to clinicians, patients, and other stakeholders. A panel of clinical experts, led by Randall Wetzel, M.D., will make a series of short presentations on problems they would like to see addressed and then lead an open discussion that prepares participants for the brainstorming session after lunch.
led by Randall WetzelPanelists include
- Robinder Khemani, M.D., M.S.C.I., Pediatric Intensivist and Clinical Researcher, Children's Hospital LA
- Heather Duncan, M.B., , Consultant in Paediatric Intensive Care, Ch.B., Birmingham Children's Hospital
- Jim Fackler, M.D., Associate Professor, Departments of Anesthesiology/Critical Care Medicine and Pediatrics, Johns Hopkins University School of Medicine
Following from pre-lunch presentations and discussion on open, data-related problems in medicine, in this session participants will be organized into groups (according to numbers on their badges) to brainstorm interesting problems and potential solutions that could be addressed by applying computational methods to large amounts of clinical data.
led by Suchi Saria
Combining crowdsourcing with machine learning enables us to leverage both the effectiveness of human computation and the scalability of machine learning. In this talk I will describe how we have applied this paradigm in a number of settings, including game-based music annotation and clinical diagnosis.
Gert Lanckriet, Ph.D.
Associate Professor, Department of Electrical and Computer Engineering
University of California, San Diego
My research focuses on convex optimization and machine learning with applications in computer music and computer audition. Previously, I have worked on applications in bioinformatics and financial engineering. I did my Ph.D. in the Department of Electrical Engineering and Computer Science at U. C. Berkeley, working with Professor Laurent El Ghaoui and Professor Michael Jordan, and my undergraduate studies in the Department of Electrical Engineering at the K.U.Leuven in Belgium.
Most vendor electronic health record (EHR) products are architected monolithically, making modification difficult for hospitals and physician practices. An alternative approach is to reimagine EHRs as iPhone-like platforms that support substitutable apps-based functionality. Substitutability is the capability inherent in a system of replacing one application with another of similar functionality. I will discuss the Substitutable Medical Applications, Reusable Technologies (SMART) Platforms project which seeks to develop a health information technology platform with substitutable apps constructed around core services. The goal of SMART is to create a common platform to support an “app store for health” as an approach to drive down healthcare costs, support standards evolution, accommodate differences in care workflow, foster competition in the market, and accelerate innovation.
Kenneth Mandl, M.D., M.P.H.
Associate Professor, Division of Emergency Medicine
Harvard Medical School
Director, Intelligent Health Laboratory, Children's Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology
Children's Hospital Boston
Kenneth D. Mandl, MD, MPH is an Associate Professor at Harvard Medical School (HMS) and the Louis Diamond Investigator at Children’s Hospital Boston, where he directs the Intelligent Health Laboratory within “CHIP”, the Children’s Hospital Informatics Program. Mandl has pioneered and published extensively in the areas of personal health records and biosurveillance. Under a major a HHS initiative, he co-leads the SMART Platforms project, which seeks to create an “app store” for health. He co-directs a CDC Center of Excellence in Public Health Informatics working to define the role of online social networks in healthcare and public health. Recognized for his teaching and research, he has received the Barger Award for Excellence in Mentoring at Harvard Medical School and the Presidential Early Career Award for Scientists and Engineers, the highest honor bestowed by the United States government to outstanding scientists and engineers. He has been an advisor to two Directors of the CDC now chairs the Board of Scientific Counselors of the NIH’s National Library of Medicine. Dr. Mandl has published over 130 papers in the medical literature and has been elected to multiple honor societies including the American Society for Clinical Investigation, the Society for Pediatric Research, the American College of Medical Informatics and the American Pediatric Society. He leads two postdoctoral training programs in clinical and informatics research and directs the Population Health Track of the new Masters Degree in Biomedical Informatics at HMS. Mandl is a faculty member in the HMS Center for Biomedical Informatics and in the Division of Health Sciences and Technology at Harvard and MIT.
The effective treatment and management of many health disorders requires individualized, sequential, decision making, in which the treatment is dynamically adapted over time based on an individual's changing illness course. Treatment policies operationalize this sequential decision making via a sequence of decision rules that specify whether, how, for whom, and when to alter the intensity, type, or delivery of behavioral and pharmacological treatments. In this talk, we discuss and, provide examples of, a randomized clinical trial design--the sequential multiple assignment randomized trial--that is currently being used to develop and optimize treatment policies. We review how one can use the resulting patient data to develop treatment policies; in particular we illustrate a data analysis algorithm that is a "reverse engineered" version of Q-Learning or Fitted Q iteration from the field of reinforcement learning. We will illustrate the use of this algorithm on data from a sequential multiple assignment randomized trial on children with ADHD.
Susan A. Murphy, Ph.D.
H.E. Robbins Professor, Department of Statistics
Professor, Department of Psychiatry
Research Professor, Institute for Social Research
University of Michigan
My current primary interest is in causal inference and multi-stage decisions sometimes called dynamic treatment regimes or adaptive treatment strategies. Dynamic treatment regimes are individually tailored treatments; formally a dynamic treatment regime is a sequence of decision rules that specify when to alter the therapy and specify which intensity or type of subsequent therapy should be offered. The decision rules employ variables such as patient response, risk, burden, adherence, and preference, collected during prior therapy. In a dynamic regime, the decision rules are specified prior to the beginning of the initial therapy. These regimes hold the promise of maximizing treatment efficacy by avoiding ill effects due to over-treatment and by providing increased treatment levels to those who can benefit. Once developed, the decision rules can be used to augment/enhance the clinical judgment used in practice. I am particularly interested in developing statistical methods and experimental designs that can be used in formulating dynamic treatment regimes. This work is funded by National Institute on Drug Abuse and by National Institute of Mental Health. I work with researchers at The Methodology Center on these topics.
Noemie Elhadad, Ph.D.
Assistant Professor, Department of Biomedical Informatics
Noemie Elhadad is an assistant professor in the department of Biomedical Informatics at Columbia University. She holds a Ph.D. in Computer Science from Columbia University. Her research interests are in natural language processing and data mining. She investigates ways in which clinical texts (these include patient notes, scientific articles, and medical textbooks) and health consumer texts (health new stories, educational health documents, and online patient posts) can be processed automatically to enhance access to relevant information for physicians, health researchers, and health consumers alike.
Saturday, August 11, 2012
Real time use of electronic documentation data to support decision-making in the operating room is complicated by unique challenges posed by the tight temporal time frame for decision-making and a 'last mile' problem. Moreover, the OR and procedural areas already have high information flux and high inherent risks, both medical and operational, and all of these features raise the stakes for decision support systems. Nevertheless, these same features, manifesting as frequent opportunities for missed vigilance, as well as decision errors of omission or wrong decisions, demand development of effective automated process monitoring and process control systems capable of reliable operation on a seconds- to minutes- time scale. These results must be pushed, without a requirement for active information seeking or searching, into the hands of providers at the bedside. In this talk, examples of the problem space will be shown, along with results of proof-of-concept solutions (and their limitations) that are beginning to see routine implementation at institutions willing to invest to improve upon the current commercial state of the art clinical OR IT systems. A framework for how such systems are financially self-sustaining is demonstrated. A qualitative description of different forms of meaningful use and complexity, taking into account the unique features of the acute care procedural environment will be developed.
Warren Sandberg, M.D., Ph.D.
Chair, Department of Anesthesiology
Professor, Department of Anesthesiology, Surgery and Biomedical Informatics
Division of Multispecialty Adult Anesthesiology
Vanderbilt University Medical Center
I am currently Professor and Chair of Anesthesiology at Vanderbilt University School of Medicine. I received M.D. and Ph.D. degrees from the University of Chicago Pritzker School of Medicine, where my doctoral work focused on the molecular determinants of protein stability. I completed anesthesia residency and twelve years as faculty at Massachusetts General Hospital, and then moved to Vanderbilt in 2010 as the 7th Chairman of the Department of Anesthesiology. My clinical interests range from ambulatory surgery to anesthesia for liver transplantation. My research career began in structural biology and mechanisms of anesthesia, but I developed broad research interests in medical technology, informatics, patient safety and OR & procedure suite operations. At MGH, I led the perioperative systems design effort in the Center for Integration of Medicine and Innovative Technology’s “Operating Room of the Future” project. Publications emanating from the OR of the Future demonstrate ways to improve throughput by improving workflow, novel approaches to managing patient flow through ancillary perioperative spaces, and straightforward ways to track and guide performance over time. Subsequently, I developed an interest in anesthesia information management systems. At Vanderbilt, a particular focus is using medical information systems for managerial and medical process monitoring, decision support and clinical process control, with particular focus on solving the ‘last mile’ problem for mobile providers on the seconds to minutes time scale.
9:35 - Predictive Modeling in Intensive Care
From growing collections of intensive care data, we can now build surprisingly accurate models that predict death, morbidities and opportunities to wean patients from dangerous interventions. These models are built using simple machine learning techniques, assuming that the experience of previous patients predicts future ones. Practicing intensivists, however, rely on more than just such experience, and we are developing models that take pathophysiologic knowledge into account, combining expert knowledge and evidence from data. I will review our current work aimed at this objective. Clinical application of such models also remains a future challenge.
Peter Szolovits, Ph.D.
Professor, Computer Science and Engineering, Department of Electrical Engineering and Computer Science
Professor, Health Sciences and Technology, Harvard/MIT Division of Health Sciences and Technology
Head, Clinical Decision-Making Group, MIT Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Peter Szolovits is Professor of Computer Science and Engineering in the MIT Department of Electrical Engineering and Computer Science (EECS), Professor of Health Sciences and Technology in the Harvard/MIT Division of Health Sciences and Technology (HST), and head of the Clinical Decision-Making Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research centers on the application of AI methods to problems of medical decision making, natural language processing to extract meaningful data from clinical narratives to support translational medicine, and the design of information systems for health care institutions and patients. He has worked on problems of diagnosis, therapy planning, execution and monitoring for various medical conditions, computational aspects of genetic counseling, controlled sharing of health information, and privacy and confidentiality issues in medical record systems. His interests in AI include knowledge representation, qualitative reasoning, and probabilistic inference. His interests in medical computing include Web-based heterogeneous medical record systems, life-long personal health information systems, and design of cryptographic schemes for health identifiers. He teaches classes in artificial intelligence, programming languages, medical computing, medical decision making, knowledge-based systems and probabilistic inference.
Large databases of medical data have enormous potential; thoughtful exploitation of this data will surely save many lives, improve outcomes, and make medical care more efficient and cost-effective. However, such data, ranging from diagnoses to DNA, is also private and potentially quite sensitive. This talk will overview recent work in computer science on formalizing notions of data privacy, which can allow us to reason in a principled manner about the tradeoff between exploiting rich datasets and protecting the people from whom the data are derived. I will focus on "differential privacy," a particular privacy definition, and on some recent tools for achieving it that may have relevance to medical data.
Katrina Ligett, Ph.D.
Assistant Professor, Computer Science and Economics
California Institute of Technology
Katrina Ligett has been an assistant professor of Computer Science and Economics at Caltech since 2011. Previously, she did postdoctoral research in computer science at Cornell University, and she received her PhD in computer science from Carnegie Mellon. Her research interests include data privacy and game theory.
Building effective methods and tools that can derive meaning and value from digital clinical data and support clinical decision making is a challenging problem that demands an interdisciplinary effort and diverse teams that include experts from fields as diverse as medicine, biology, statistical learning, decision theory, human-computer interaction, hardware engineering, systems design, and cognitive psychology. Building such teams and getting them to work together productively can present a variety of challenges, including cross-domain communication, goal alignment, and clashing cultures. Prof. Héctor Corrada Bravo and an interdisciplinary panel of researchers lead an interactive discussion about how to perform successful cross-domain research.
Héctor Corrada Bravo, Ph.D.
Assistant Professor, Department of Computer Science
UM Institute for Advanced Computer Studies
Center for Bioinformatics and Computational Biology
University of Maryland
In this talk I will describe novel systems that allow users to reflect or react upon their moods, both retrospectively and in real time. We surveyed potential users of such systems to see what they remembered about their mood swings and behavioral patterns emotionally over time, and it was clear that they felt they did not have a good handle on this after even 48 hours. They also let us know that they would value systems that notified them of their moods or behavioral trends with simple mobile phone alerts. We then built systems to help users track their moods, and tested them on real users over a period of time. The results were promising. Users found interesting patterns in the data and gave us great feedback on how to evolve the user interface visualizations for real time feedback on emotional reactions, mood swings and activities.
Mary Czerwinski, Ph.D.
Research Area Manager, Visualization and Interaction (VIBE) Research Group
Mary Czerwinski is a Research Area Manager of the Visualization and Interaction (VIBE) Research Group. Mary's research focuses primarily on emotion tracking, information worker task management, multitasking, and awareness systems for individuals and groups. Her background is in visual attention and multitasking. She holds a Ph.D. in Cognitive Psychology from Indiana University in Bloomington. Mary was awarded the ACM SIGCHI Lifetime Service Award, was inducted into the CHI Academy, and became an ACM Distinguished Scientist in 2010. Here is a link to her curriculum vita.
As a growing number of hospitals have adopted the use of electronic records systems to manage data collected during the course of patient care, leveraging this data to improve the quality of care has emerged as a key problem. The physiological data contained in ICU electronic health records can be thought of as a multivariate time series that begins when the patient is admitted and ends when the patient is discharged. Each time series contains measurements for a different physiological variable like heart rate or blood pressure. These data are entered by medical staff during routine care and are not continuously recorded. This results in data with several very challenging properties that push the boundaries of machine learning and computational statistics. In this talk, I will describe a number of these properties including temporal sparsity, irregular sampling, the possible presence of sample selection bias and/or non-random missing data, and the confounding effect of interventions. I will present research addressing temporal sparsity using probabilistic mixture models and highlight current research targeting irregular sampling using novel kernel-based models and neural networks.
Benjamin Marlin, Ph.D.
Assistant Professor, Department of Computer Science
University of Massachusetts Amherst
I am an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. I was previously a fellow of both the Pacific Institute for the Mathematical Sciences and the Killam Trusts at the University of British Columbia where I was based in the Laboratory for Computational Intelligence in the Department of Computer Science. I completed my PhD in machine learning in the Department of Computer Science at the University of Toronto. My research interests lie at the intersection of artificial intelligence, machine learning and statistics. I am particularly interested in hierarchical graphical models and approximate inference/learning techniques including Markov Chain Monte Carlo and variational Bayesian methods. I am also interested in the study of non-likelihood-based inductive principles for statistical models and the trade-off between statistical consistency/efficiency and computational efficiency. I am interested in a broad range of applications for these modeling and learning techniques including classification, collaborative filtering, ranking, unsupervised structure discovery, feature induction, object recognition/image labeling and medical informatics.
Most organizations initially adopt Hadoop in order to have cheap online storage for data of uncertain value and to take advantage of the MapReduce parallel programming framework in order to ensure that important ETL jobs meet their service-level agreements. However, when you talk to those organizations a year later, the value that they find in Hadoop lies in its flexibility: the ability to organize their data in order to answer new questions, run large-scale machine learning algorithms to influence the future, and to build fast, scalable backend systems to provide intelligent services to their customers. In this talk, we will walkthrough the lifecycle of a Hadoop cluster at successful organizations, from the common starter use cases to building a data science team to production deployment, and give an overview of advanced analytical applications deployed by Cloudera customers across a variety of industries, from insurance to healthcare to bioinformatics.
Director, Data Science
Josh Wills is Cloudera's Director of Data Science, working with customers and engineers to develop Hadoop-based analytical applications across a wide-range of industries. Prior to joining Cloudera, Josh spent some time at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure behind Google+.
17:15 - Interactive Session 4: Meaningful Business Use of Clinical Data
So you’ve got data and can do something with it. So what? If you want to successfully transform it into value, that’s a completely different story. Value, as in using it to create a successful application, product, service or even company or business. Success as in something that the market wants, will pay for, and scales over time. And if you’re interested in social or public good creating value from data will not only help you gain users but will put yourself in a position to do greater good.
Sounds good? Too bad data-based healthcare start-ups fail - at mind-boggling rates. Fail, not as in fail to get funding, or even to a do a little run, but fail as in fail to go to exit or even have descent value metrics. Given the crisis in healthcare, the need for data and all the hoopla it simply shouldn’t be so. Some fail because of the buzz, some because of the perverse incentives (unwarranted variation from fee-for-service) but one way or another most fail because they start on the wrong end, beginning with the data or technology, rather than the business need.
In this session we’ll play a game, starting from the business side, then matching those needs with the data while working through how to wrap the connection with an app/product/biz that specific enough to succeed and avoid caving under the weight of that success from perverse incentives. A good little exercise in general, and a necessary primer if you’re planning on hacking - not just the code, but the system.
Josh Rosenthal, Ph.D.
Co-Founder and Chief Science Officer
Josh is a Co-Founder and CSO of RowdMap. RowdMap's Health Profit Intelligence platform creates Simple Growth, Performance & Value for Health Plans, Providers, Hospitals & Nursing Homes. Before co-founding RowdMap, Josh founded Sprigley (acquired by Eliza Corporation, 2008). As Chief Scientific Officer of Sprigley, a health engagement platform and data/analytic system, Josh turned an idea into a platform that quantified qualitative data, especially behavioral data, while creating socio-graphic and psychographic metrics that measured and predicted interactions and interventions’ outcomes and impact. While at Eliza, Josh served as Product Engagement Guru (VP, Product Development), where he successfully sold the Sprigley platform before retooling it to create a premium analytic offering. He forged the organization structures and collateral as well as directly sold this incredibly powerful new offering, which saw Eliza win accolades such as a Business Week Innovation award and named one of Entrepreneur Magazine’s 10 Companies to Watch in Health Care. Firing on all cylinders, Josh played a central role in the sale of a major equity investment in Eliza (Parthenon Capital, 2011). Josh received a Fulbright to the Sorbonne’s Institute for Advanced Studies (EPHE), an interdisciplinary think tank where he began exploring quantifying qualitative data, behavior change and complex systems. He holds a PhD in History, Master’s degrees in theology and has taught exceptional (special) education in inner-city public elementary schools. He's served as an industry expert on technology and innovation, data & analytics and public data access to the Department of Health and Human Services, the Centers for Medicare and Medicaid Services, Office of National Coordinator & National Committee on Vital and Health Statistics, lectured at Harvard & MIT and spoken at HDI, SXSW, etc.
Sunday, August 12, 2012, Break-out Sessions
On Sunday throughout the day, all conference rooms in the Saban Research building will be available for use by attendees to hold break-out sessions and other meetings. Based on registrant feedback, we have organized a couple of specific topical break-out sessions, but we encourage attendees to organize additional informal sessions. Throughout the meeting, we will provide a large bulletin board in the lobby that attendees can use to indicate their interests, reserve space, and organize these sessions. Attendees are also welcome to use the available space for private meetings.
Meaningful Use and Analysis of EHR Data, 9-10am, room TBD
hosted by Practice Fusion
Representatives from Practice Fusion, including Data Scientist Jake Marcus, will lead an open discussion (following on from Friday's Interactive Sessions) about the potential for secondary analyses of EHR data, particularly addressing the following questions:
- Potential secondary analyses of EHR data. What will be discovered using medical record datasets?
- New ideas for data-driven features delivered at the point of care through web-based electronic health records. If the killer data-driven feature for the social, web world is "People You May Know," what is it for the web, health world?
Practice Fusion's Research Division facilities access for researchers to one of the largest longitudinal clinical databases in the US for the purposes of clinical research and public health analysis. If you have an idea about how to use a dataset covering over 40 million patients or are interested in getting access, then Practice Fusion wants to hear from you!
Information Visualization and Interactive Analysis, 10-11am, room TBD
led by Diana MacLean, Stanford University
Diana MacLean will lead an informal discussion about information visualization and interactive analysis. Diana has a breadth of experience with not only visualization techniques and interaction design but also statistics and data analytics. She is completing her Ph.D. at Stanford University, supervised by Prof. Jeffrey Heer, and has interned at LinkedIn and Microsoft Research.
Time Series Modeling and Data Mining, 11am-noon, room TBD
led by Josh Patterson, Principal Solutions Architect, Cloudera
Josh Patterson from Cloudera will lead an informal discussion about working with time series data, which is particularly prevalent in clinical scenarios, particularly critical care. Topics will include architectures managing and processing large volumes of temporal data; data-driven vs. model-driven approaches; pattern discovery, clustering, and classification. Other experts will join the discussion.