crowdTop data scientists from around the country gathered in the Rackham Building on Oct. 6 for a symposium to launch the Michigan Institute for Data Science (MIDAS), the centerpiece of the University’s recently announced $100 million investment in data science.

Titled “The Future of Data Science: A Convergence of Academia, Industry and Government,” the symposium highlighted current research across the spectrum of data science; outlined faculty opportunities; examined data science challenges in health, learning analytics, social science and transportation; discussed regional, national and international data science initiatives and partnerships; and explored possibilities for collaboration with industry.

To view videos of all the sessions, visit the MIDAS Kickoff Symposium YouTube playlist. Special thanks to MconneX and the College of Engineering for video recording and editing.

Links to slides and videos of each presentation are also included below.

Introductions, Eric Michielssen, Associate Vice President for Advanced Research Computing and Professor, Electrical Engineering and Computer Science; Martha Pollack, Provost; S. Jack Hu, Interim Vice President for Research.
Video

Overview, Data Science Initiative, Eric Michielssen, Associate Vice President for Advanced Research Computing and Professor, Electrical Engineering and Computer Science.
Slides | Video

Overview, Michigan Institute for Data Science, MIDAS co-director Brian Athey, Michael A. Savageau Collegiate Professor and Chair, Computational Medicine and Bioinformatics; and Alfred Hero, MIDAS co-director and R. Jamison and Betty Williams Professor of Engineering.
Slides: Part 1, Part 2 | Video

Machine Learning for Data Science, Robert Nowak, McFarland-Bascom Professor in Engineering, University of Wisconsin-Madison.
Abstract: Machine learning is an area of Computer Science focused on designing computer programs that enable machines to learn by example, much in the way young children are taught to understand the world around them. Machine learning takes advantage of the availability of massive datasets and powerful computing resources to automatically discover patterns and structure in data. In this talk, we look at several cutting-edge applications of machine learning that highlight key innovations and progress made recent years, such as deep learning and high-dimensional statistics. The talk also surveys exciting challenges that lie ahead.
Slides | Video

Micro-randomized Trials in Mobile Health, Susan Murphy, H.E. Robbins Distinguished University Professor of Statistics, University of Michigan.
Abstract: We describe a sequence of steps that facilitate effective learning of treatment policies in mobile health. These include a clinical trial with associated sample size calculator and data analytic methods. An off-policy Actor-Critic algorithm is developed for learning a treatment policy from this clinical trial data. Open problems abound in this area, including the development of a variety of online predictors of risk of health problems, missing data and disengagement.
Slides | Video

At the Intersection of Language and Data ScienceKathleen McKeown, Director of the Institute for Data Sciences and Engineering, Columbia University.
Abstract: Data science holds the promise to solve many of society’s most pressing challenges. But much of the necessary data is locked within the volumes of unstructured data on the web including language, speech and video. In this talk, I will describe how data science approaches are being used in research projects that draw from language data along a continuum from fact to fiction. I will present a system that predicts the future impact of a scientific concept—represented as a technical term—based on the information available in recently published research articles, research on learning from knowledge of past disasters, as seen through the lens of the media and on the use of data science in understanding subjective, personal narratives.
Slides | Video

Panel Discussion, Data Science Methodologies, Robert Nowak, McFarland-Bascom Professor in Engineering, University of Wisconsin-Madison; Susan Murphy, H.E. Robbins Distinguished University Professor of Statistics, University of Michigan; and Kathleen McKeown, Director of the Institute for Data Sciences and Engineering, Columbia University.
Video

Introduction, MIDAS Education and Training Program, Ivo Dinov, MIDAS Associate Director for Education and Training; Associate Professor of Human Behavior and Biological Sciences; Director, Statistics Online Computational Resource (SOCR).
Slides | Video

Developing Effective Data ScientistsErin Shellman, Research Scientist, Amazon Web Services.
Abstract: Data science is an emergent field that incorporates concepts from statistics, computer science and machine learning to create and apply knowledge from data. In this talk I share what I think are the essential skills and characteristics of effective data scientists. I also provide guidance on how students can develop those skills in school and how educators can prepare them for jobs in industry.
Slides | Video

Economic Implications of Data Science: Past, Present and FuturePatrick Harrington, Co-Founder and Chief Data Scientist, Comp Genome.
Abstract: Big Data and the field of Data Science present a novel paradigm facing the global economy: granular control, measurement, and rigorous optimization of everything. In this talk, I will discuss the range of economic verticals that have been touched by big data and data science and those ripe for efficiency gains under adoption of these schools of thought. Healthcare, finance, advertising, e-commerce, and ultimately human beings are in a mature efficiency trajectory of operating performance enabled by data science with the latter, human beings, ready to me made more efficient. I will speak to my current venture, compgenome.com and how the evolution of an on-demand, real time talent market provides transparency with employer compensation and ultimately optimal career navigation. Finally, I will discuss how data science, coupled with software engineering is a viable and lucrative career path likely to exist for decades to come.
Slides | Video

Focusing on the Inputs for the Data SciencesNandit Soparkar, Chief Executive Officer, Ubiquiti.
Abstract: The inputs via data-entry are crucial to the success of the data sciences. This surprisingly under-served area suggests challenges and opportunities for applied data sciences. Drawing analogies with the related areas of OLAP and data mining, I will describe specific examples where the inputs need to be pre-processed appropriately. In particular, I will discuss text & natural language inputs in the automotive and healthcare sectors. I will also briefly touch upon the human factors relevant for data inputs, given that all data science efforts are ultimately by and for the broader human endeavor.
Slides | Video

Panel Discussion, Data Science Education and TrainingErin Shellman, Research Scientist, Amazon Web Services; Patrick Harrington, Co-Founder and Chief Data Scientist, Comp Genome; and Nandit Soparkar, Chief Executive Officer, Ubiquiti
Video

Privacy and Reproducibility in Data ScienceDaniel Goroff, Vice President of the Alfred P. Sloan Foundation delivered the keynote address on “Privacy and Reproducibility in Data Science.”
Abstract: Exploratory data analysis is fun but dangerous. Observations alone, no matter how many, can rarely justify causal inferences. Simple calculations show that, even playing strictly by the current rules of empirical science, a shocking percentage of the conclusions reached will be wrong. Those same calculations show that reproducing hypothesis tests can make them much more reliable. The Sloan Foundation actively supports efforts to make empirical research more reproducible, including the development of mathematical approaches to privacy-preserving research. Recent and surprising theorems show how, even if privacy is not an issue, some of the techniques developed to protect confidential information can also protect against false discovery due to multiple hypothesis testing and exploratory data analysis.
Slides | Video

Big Data and the Evolution of Precision (Personalized) Medicine, George Poste, Regents’ Professor and Del E. Webb Chair in Health Innovation, Arizona State University.
Abstract: The convergence of molecular biology, informatics, sensor and mobile device technologies and social media is forging a new era of precision medicine in which large scale data on disease processes in individuals and populations, their environments and their behavior will enable disease detection, treatment and prevention to be based increasingly on individual-specific (personalized) parameters to achieve better health outcomes at lower cost. The long term trajectory for precision medicine will progressively shift the focus of care from the current episodic, reactive responses to illness to proactive, continuous real-time monitoring of health status for earlier detection of disease, improved treatment compliance and other risk reduction strategies to prioritize maintaining health versus managing illness.

Realization of these aspirations will generate data on an unprecedented scale. The rise of precision medicine and data-intensive medicine are inextricably linked. The current health care ecosystem is ill prepared for this union and its implications for the future medical curriculum, new skill needs for physicians, infrastructure and personnel for advanced data analytics, the evolution of new models of healthcare delivery and the entry of influential new participants from the computing, logistics and consumer realms hitherto uninvolved in healthcare.
Slides | Video

“Learning Engineering”: Using Evidence to Lift Learning PerformanceBror Saxberg, Chief Learning Officer, Kaplan, Inc.
Abstract: There’s much research about how learning can be enhanced by the right kinds of learning experiences, including how technology can help. However, little of that is getting to students at scale, compared with random walks with technology. (“Video is great, right? Must have more!”)

We’ll talk about what it means to be “learning engineers”: applying evidence and learning science at scale in practical circumstances. As in any other good design or engineering effort, we want to see what works, and doesn’t, with careful data collection. Scale (plus technology, where appropriate) enables the creation of test-beds for systematic improvement as well as a chance for reliable impact.
Slides | Video

Predicting the Group: Data Science for Human Socio-cultural Understanding and PredictionKathleen Carley, Professor of Computation, Organization and Society, Carnegie Mellon University.
Abstract: Our ability to understand and predict socio-cultural activity is being transformed by the exponential growth in big data available on the web – both social media and open government and organizational records.

Analysis of such data has the potential to create the timely and detailed information needed to improve crisis response and so save lives and goods, improve community resilience, support early identification of security threats and decrease social-cyber attacks. However, whether considering issues such as disaster response, cyber-security, or state-stability the same core methodological challenges keep rising to the fore.

Three of these key methodological challenges are driven by the nature of the data: “wide” data, sampled data, and geo-temporal data. In this presentation the promise of the new big data science for social behavior is described as well as the challenges that need to be considered. These point will be illustrated using a variety of examples related to early tsunami warning in Indonesia, crisis response in Libya, state stability in the Middle East, and cyber-security globally.
Slides | Video

Transforming and Disrupting Personal Transportation: Opportunities for Data ScienceJonathan Owen, Director of Operations Research, VP of Practice, INFORMS, General Motors.
Abstract: The auto industry is increasing vehicle electrification, introducing connected vehicle capabilities, and adding more intelligence to in-vehicle electronics, controls and active safety systems that will ultimately lead to automated driving technologies. The convergence of these technologies promises to transform and even disrupt personal transportation as we know it today. This presentation will provide a high-level overview on the future of personal mobility and discuss major opportunities and challenges for data science as the automotive transformation occurs.
Slides

Panel Discussion, Data Science Challenges and Opportunities, George Poste, Regents’ Professor and Del E. Webb Chair in Health Innovation, Arizona State University; Bror Saxberg, Chief Learning Officer, Kaplan, Inc.; Kathleen Carley, Professor of Computation, Organization and Society, Carnegie Mellon University; Jonathan Owen, Director of Operations Research, VP of Practice, INFORMS, General Motors.
Video

MIDAS Data Science Community Partnership Vision, MIDAS co-director Brian Athey, Michael A. Savageau Collegiate Professor and Chair, Computational Medicine and Bioinformatics.
Video

Midwest Big Data Hub, Ed Seidel, Founder Professor, Departments of Physics and Astronomy, University of Illinois at Urbana-Champaign.
Abstract: Midwest Big Data Hub is a network of partners that has unique resources based in the Midwest that will address challenges in collecting, managing, serving, mining, and analyzing rapidly growing and increasingly complex data and information collections to create actionable knowledge and guide decision-making. I will describe expected activities of the Hub as we build collaborations and pilot projects with academic, industry, government and non-profit partners.
Slides | Video

Big Data Driven Business Analytics: We are just getting Started! Ratna “Babu” Chinnam, Professor of Engineering, Wayne State University.
Abstract: McKinsey’s Global Institute predicted back in 2011 that analyzing large big data sets would become a key basis of competition for firms, underpinning new waves of productivity growth, innovation, and consumer surplus across most business sectors. Recent studies are reporting that over 85% of leading Fortune 1000 companies have a Big Data initiative in progress or under planning stages. The primary reason cited by businesses for investing in Big Data is to enable better, fact-based decision making. However, most businesses are floundering in their ability to extract value from any data. Big Data management and analytics require a multitude of advanced concepts, tools and technologies, and the required skills are hard to come by. In addition, given the absence of effective data science and operations research, there is undue reliance on traditional techniques of the past to drive big data analytics. For most companies, data-analytics success has been limited to a few tests or to narrow slices of the business and few have achieved any resemblance of what we would call “big impact through big data”. What we need is more tools and technologies that are effective on the back end and far more emphasis on data-driven business processes on the front end!
Slides | Video

Northeast Big Data Hub; Institute for Data Sciences and Engineering, Columbia UniversityKathleen McKeown, Director of the Institute for Data Sciences and Engineering, Columbia University.
Slides | Video

Building Open Data Collaborations across Academia, Industry and GovernmentKeith Elliston, Chief Executive Officer, tranSMART Foundation
Slides | Video

Panel Discussion, Data Science Collaborations and PartnershipsEd Seidel, Founder Professor, Departments of Physics and Astronomy, University of Illinois at Urbana-Champaign; Ratna “Babu” Chinnam, Professor of Engineering, Wayne State University; Kathleen McKeown, Director of the Institute for Data Sciences and Engineering, Columbia University; Keith Elliston, Chief Executive Officer, tranSMART Foundation.
Video