NOVEMBER 15, 2016


8 a.m. — Registration / Coffee

8:30 a.m. — Welcome, Eric Michielssen, Associate Vice President, Advanced Research Computing

Eric Michielssen is the Associate Vice President, Advanced Research Computing, the Louise Ganiard Johnson Professor of Engineering, and Professor of Electrical Engineering and Computer Science, U-M College of Engineering

8:40 a.m. — MIDAS: The Year in Review, Co-Directors Al Hero and Brian Athey

athey_brian-bestBrian Athey is the Michael A. Savageau Collegiate Professor and Chair of the Department of Computational Medicine and Bioinformatics, and Professor of Psychiatry and Internal Medicine. HeroJan2010Al Hero is the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science, R. Jamison and Betty Williams Professor of Engineering, Professor of Biomedical Engineering, and Professor of Statistics

9:15 a.m. — KEYNOTE: Robert Groves, Georgetown University

groves_robertTitle: Government Statistics in a New Data World
Most countries in the world, especially democracies, have developed quasi-independent government statistical agencies to monitor the social and economic status of their populations.  The majority of those statistics use statistical sampling of target populations and the application of pre-designed measurements on the samples.  These are “designed” data in that the measurements are constructed to provide specific data on each sampled unit. Falling response rates to those sample surveys have led to increasing costs of those efforts and to risks of nonresponse bias in the resulting statistics.
At the same time, new digital resources are arising, stimulated by the Internet and ubiquitous management information systems.  The amount of data on day-to-day economic and social transactions is larger now than ever before in human history.  While the measurements yielding these data were not largely designed for social and economic statistics, many of them have informational value for those purposes. However, the resources are, to a large extent, not owned by institutions that have missions to serve the common good informational needs of the society.
The issues for official statistics in this new data world are discussed, with a focus on the lacunae in measurement theories, the family of statistical models useful in this new world, and the institutional structures that need consideration.
Bio: Robert M. Groves is the Gerard J. Campbell, S.J. Professor in the math and statistics department as well as the sociology department at Georgetown University where he has served as the Executive Vice President and Provost since 2012. Groves is a social statistician who studies the impact of social cognitive and behavioral influences on the quality of statistical information.  His research has focused on the impact of mode of data collection on responses in sample surveys, the social and political influences on survey participation, the use of adaptive research designs to improve the cost and error properties of statistics, and public concerns about privacy affecting attitudes toward statistical agencies.
Prior to joining Georgetown as provost he was director of the U.S. Census Bureau (presidential appointment with Senate confirmation), a position he assumed after being director of the University of Michigan Survey Research Center, professor of sociology, and research professor at the Joint Program in Survey Methodology at the University of Maryland.
He has authored or co-authored seven books and scores of peer-reviewed articles.  His 1989 book, Survey Errors and Survey Costs, was named one of the 50 most influential books in survey research by the American Association of Public Opinion Research. His book, Nonresponse in Household Interview Surveys, with Mick Couper, received the 2008 AAPOR Book Award.  His co-authored book, Survey Nonresponse, received the 2011 AAPOR Book Award.
Groves serves on several boards and advisory committees including the Pew Research Board, the Population Reference Bureau, and the Statistics Canada Advisory Committee. He is an elected member of the US National Academy of Sciences, an elected member of the National Academy of Medicine of the US National Academies, an elected member of the American Academy of Arts and Sciences, an elected fellow of the American Statistical Association, and an elected member of the International Statistical Institute.
Groves has a bachelor’s degree from Dartmouth College and master’s degrees in statistics and sociology from the University of Michigan. He also earned his doctorate at Michigan.

10:15 a.m. — Panel: Big Data, An International Perspective

This panel, moderated by Al Hero, will bring together data scientists from around the world to address a global point of view on the advancements and potential new applications of data science. Each panelist will speak for approximately 25 minutes before a panel discussion.

dunn-michelleMichelle Dunn, National Institutes of Health
Title: Data Science: A View from the NIH
Abstract: Data Science is increasingly necessary for biomedical science.  In 2014, the NIH started awarding grants under the Big Data to Knowledge (BD2K) Initiative to make additional, new investments in data science.  Training in Data Science is a big part of this prominent investment.   I will describe the BD2K Initiative, with a focus on the innovative experiments to prepare the workforce for a future where data science is an even more integrated part of biomedical science.
Bio: In the ADDS Office, Dr. Dunn’s responsibilities focus on education, training, and workforce development in data science, as it is applied to the biomedical, behavioral, and clinical sciences. Having a diverse and sustainable workforce is a primary objective of the office.
Prior to joining the NIH/OD, Dr. Dunn was a program director at the National Cancer Institute. In addition to holding a portfolio of research grants in statistical methodology development, she co-chaired the BD2K Initiative’s subcommittee on training.
Dr. Dunn received her Ph.D. in statistics from Carnegie Mellon University and her A.B. in applied mathematics from Harvard College.
luo_tomTom Luo, Shenzhen Research Institute, China
Title: Shenzhen Research Institute of Big Data: Opportunities and Challenges
Abstract: Located on the campus of the Chinese University of Hong Kong, Shenzhen (CUHK(SZ)), the Shenzhen Research Institute of Big Data is a newly established research institute by the Shenzhen government in China. The Institute is operated by a research team from CUHK(SZ), and conducts collaborative research with industries, government agencies, as well as international affiliated members. In this presentation, I will highlight some opportunities and challenges of big data research and education in Shenzhen as well as in China.
Bio: Zhi-Quan (Tom) Luo received his B.Sc. degree in Applied Mathematics in 1984 from Peking University, Beijing, China, and a Ph.D. degree in Operations Research in 1989 from MIT. From 1989 to 2003, he held a faculty position with the Department of Electrical and Computer Engineering, McMaster University, Canada, where he eventually became the department head and held a Canada Research Chair in Information Processing. Since April of 2003, he has been with the Department of Electrical and Computer Engineering at the University of Minnesota (Twin Cities). His research interests include optimization algorithms, signal processing and digital communication. He is currently serving as the Vice President (Academic) at the Chinese University of Hong Kong, Shenzhen, and the Director of the Shenzhen Research Institute of Big Data.
Dr. Luo is a fellow of SIAM and IEEE, and serves as the past chair of the IEEE Signal Processing Society Technical Committee on the Signal Processing for Communications (SPCOM). He is a recipient of the 2004, 2009 and 2011 IEEE Signal Processing Society’s Best Paper Awards, the 2015 IEEE Signal Processing Magazine Best Paper Award, the 2010 Farkas Prize from the INFORMS Optimization Society, the 2010 EURASIP Best Paper Award and the 2011 ICC Best Paper Award. He has held editorial positions for several international journals including the SIAM Journal on Optimization, Mathematics of Computation and Management Science. He served as the Editor-in-Chief for the journal IEEE Transactions on Signal Processing from 2012-2014. He was elected to the Royal Society of Canada in 2014.
patrick-wolfe-150x150Patrick Wolfe, University College London, The Alan Turing Institute
Big Network Data: Challenges and Opportunities
How do we draw sound and defensible data-analytic conclusions from networks? This question has recently risen to the forefront of mathematical statistics, and it represents a fundamental challenge for data science. In this talk I will describe new large-sample theory that helps us to view and interpret networks as statistical data objects, along with the transformation of this theory into new statistical methods to model and draw inferences from network data in the real world. The insights that result from connecting theory to practice also feed back into pure mathematics and theoretical computer science, prompting new questions at the interface of combinatorics, analysis, probability, and algorithms.
Bio: Patrick J. Wolfe is Professor of Statistics and Honorary Professor of Computer Science at University College London, where he is a Royal Society and EPSRC Mathematical Sciences Research Fellow.
From 2001-2004 he held a Fellowship and College Lectureship in Engineering and Computer Science at Cambridge University, where he completed his PhD in 2003. As a US National Science Foundation Graduate Research Fellow, his doctoral work on the statistical modeling of speech and audio waveforms was honored by the Acoustical Society of America, the UK Royal Statistical Society, and the International Society for Bayesian Analysis. Prior to joining UCL he was Assistant (2004-2008) and Associate (2008-2011) Professor at Harvard University, receiving the Presidential Early Career Award from the White House in 2008 for contributions to signal and image processing.
Professor Wolfe currently serves as Executive Director of the UCL Big Data Institute ( Externally to UCL, he serves on the Research Section Committee of the Royal Statistical Society, on the Program Committee of the 2015 Joint Statistical Meetings, and as an organizer of the 2016 Newton Institute program on Theoretical Foundations for Statistical Network Analysis (

Noon —  Refreshments, Poster Session

1:30 p.m. —  Panel: Data Science Methodologies

This panel will address a variety of methodological topics in data science, from the perspectives of social science, transportation, and learning analytics. Each panelist will speak for approximately 25 minutes before a panel discussion.

cassandros_christosChristos Cassandras, Boston University
Title: Using Data to Solve Inverse Optimization Problems in Transportation Networks: Estimating the Price of Anarchy
Abstract: One of the main reasons for our inability to effectively control transportation systems is attributed to the lack of knowledge of driver behavior. However, the availability of large amounts of traffic data allows us to formulate and solve non-traditional inverse optimization problems: we process the actual data to deduce the cost functions implicitly used by drivers in order to achieve a prevailing Wardrop user-centric (or selfish) equilibrium. Once these estimated cost functions are available, one can then solve forward optimization problems to determine a system centric (or social) equilibrium. The ratio of the resulting optimal costs defines the Price of Anarchy (POA) and quantifies the efficiency loss due to selfish behavior compared to socially optimal behavior. When the POA is shown to be large, this provides the motivation and justification for Connected Automated Vehicles (CAVs) whose emergence is a key element in Smart Cities. This talk will include results from our experience with large traffic datasets from the Eastern Massachusetts road network.
Bio: Christos G. Cassandras is Distinguished Professor of Engineering at Boston University. He is Head of the Division of Systems Engineering, Professor of Electrical and Computer Engineering, and co-founder of Boston University’s Center for Information and Systems Engineering (CISE). He received degrees from Yale University (B.S., 1977), Stanford University (M.S.E.E., 1978), and Harvard University (S.M., 1979; Ph.D., 1982). In 1982-84 he was with ITP Boston, Inc. where he worked on the design of automated manufacturing systems. In 1984-1996 he was a faculty member at the Department of Electrical and Computer Engineering, University of Massachusetts/Amherst. He specializes in the areas of discrete event and hybrid systems, cooperative control, stochastic optimization, and computer simulation, with applications to computer and sensor networks, manufacturing systems, and transportation systems. He has published over 380 refereed papers in these areas, and five books. He has guest-edited several technical journal issues and serves on several journal Editorial Boards. In addition to his academic activities, he has worked extensively with industrial organizations on various systems integration projects and the development of decision-support software. He has most recently collaborated with The MathWorks, Inc. in the development of the discrete event and hybrid system simulator SimEvents.
Dr. Cassandras was Editor-in-Chief of the IEEE Transactions on Automatic Control from 1998 through 2009 and has also served as Editor for Technical Notes and Correspondence and Associate Editor. He is currently an Editor of Automatica. He was the 2012 President of the IEEE Control Systems Society (CSS). He has also served as Vice President for Publications and on the Board of Governors of the CSS, as well as on several IEEE committees, and has chaired several conferences. He has been a plenary/keynote speaker at numerous international conferences, including the American Control Conference in 2001 and the IEEE Conference on Decision and Control in 2002, and has also been an IEEE Distinguished Lecturer.
He is the recipient of several awards, including the 2011 IEEE Control Systems Technology Award, the Distinguished Member Award of the IEEE Control Systems Society (2006), the 1999 Harold Chestnut Prize (IFAC Best Control Engineering Textbook) for Discrete Event Systems: Modeling and Performance Analysis, a 2011 prize and a 2014 prize for the IBM/IEEE Smarter Planet Challenge competition (for a “Smart Parking” system and for the analytical engine of the Street Bump system respectively), the 2014 Engineering Distinguished Scholar Award at Boston University, several honorary professorships, a 1991 Lilly Fellowship and a 2012 Kern Fellowship. He is a member of Phi Beta Kappa and Tau Beta Pi. He is also a Fellow of the IEEE and a Fellow of the IFAC.
ensor_kathleenKatherine Ensor, Rice University
Title: Urban Analytics and the Importance of Statistical Thinking
Abstract: Everyday life in cities is generating unprecedented amounts of data. City leadership and citizenry now have the opportunity to use urban data and are becoming increasingly data hungry and data dependent. As we measure and observe more about how city residents work, live, learn and play our ability to understand our cities and to optimize city services increases with this information. Information about our residents is integrally linked to the built environment and city infrastructure. Further, the risks, especially privacy risks, associated with urban data are important considerations. In this talk I will highlight the creation of the Rice Kinder Institute Urban Data Platform and what we hope to accomplish through this endeavor. Beyond data, success in urban analytics requires making informed robust decisions.  It cannot be overstated that these decisions rely on quality data feeding sound statistical methods to produce reliable information. I bring forward several examples of the key role that urban analytics can and will play in our vibrant cities of the future.
Bio: Katherine Bennett Ensor is Professor of Statistics at Rice University where she serves as director of the Center for Computational Finance and Economic Systems (CoFES) and was chair of the Department of Statistics from 1999 through 2013. Dr. Ensor, an expert in many areas of modern statistics, develops innovative statistical techniques to answer important questions in science, engineering and business with specific focus on the environment, energy and finance. She is an elected fellow of the American Statistical Association, the American Association for the Advancement of Science and has been recognized for her leadership, scholarship and mentoring. She is Vice President of the American Statistical Association and a member of the National Academies Committee on Applied and Theoretical Statistics. She holds a BSE and MS in Mathematics from Arkansas State University and a PhD in Statistics from Texas A&M University.
fletcher_dexterJ.D. Fletcher, Institute for Defense Analysis
Title: Accelerating the Development of Expertise with Digital Tutoring
Abstract: The use of machine intelligence to clone and achieve the effectiveness of one-on-one human tutoring has been studied since the mid-1960’s. Examples such as the early MENTOR and SOPHIE systems, along with recent meta-analyses of digital tutoring effectiveness, suggest that this goal is substantially achievable.  Recently, a Defense Advanced Research Programs Agency (DARPA) effort was undertaken to develop a digital tutor that would accelerate the development of expertise among novices training to become information systems technicians (IT), without increasing the time now taken to develop entry or journeyman levels of competence. Results from third party assessment of the tutor, which was developed for the US Navy, found that its 16-week graduates substantially outperformed, in tests of knowledge and troubleshooting skill, graduates from 35 weeks of classroom instruction and IT technicians averaging 9 years of experience.  It was also found to yield substantial monetary cost-effectiveness and return on investment in training sailors for Navy IT duty and military veterans for civilian employment.
Dr. J. D. Fletcher is a research staff member at the Institute for Defense Analyses, which performs research and analysis on scientific and technical matters for the Office of the Secretary of Defense. He holds graduate degrees in computer science and educational psychology from Stanford University where, as a research associate, he directed numerous projects for the Institute for Mathematical Studies in the Social Sciences. While at Stanford, he was part of the team that developed the first computer assisted instruction programs for the deaf. He also designed and developed the first CAI program for K-3 reading using digitized audio. Fletcher’s research produced these and other CAI systems for use in public schools, as well as training devices used by the military. He has held academic positions in psychology, computer science, and systems engineering, and has held government positions as a research psychologist and program manager for the Navy, Army, Defense Advanced Research Projects Agency, and the White House Office of Science and Technology Policy. He is a fellow of the American Educational Research Association and three divisions of the American Psychological Association.

3:15 p.m. — Panel: Data Science in Health Research

This panel will address data science in health research through the prisms of bioengineering, biostatistics, and genomics.

colemanTodd Coleman, UC San Diego
Title: A Symbiotic Relationship Between Data Science and Health Sciences
Abstract: Novel technological advances now allow for unforeseen ways to acquire massive physiologic information about the human body.  This creates new challenges, however, as it relates to turning these rich datasets into actionable information.  With this in mind, Dr. Coleman will discuss novel applied probability methods of interpreting such acquired physiologic data for prediction, diagnosis, and prevention purposes.  An emphasis will be played on engineering aggregate systems that address economic, social, and scalability challenges.  Dr. Coleman will discuss applications that include: sequential experiment design for brain-computer interfaces; non-stationary spectral estimation for ambulatory sleep monitoring; and uncertainty quantification and risk stratification for quantitative functional assessment of the gastro-intestinal system.  Throughout the talk, Dr. Coleman will emphasize the inter-disciplinary nature of this research, involving themes from applied mathematics, statistics, neuroscience, and medicine.
Bio: Todd P. Coleman received B.S. degrees in electrical engineering (summa cum laude) and computer engineering (summa cum laude) from the University of Michigan (Go Blue). He received M.S. and Ph.D. degrees from MIT in electrical engineering, and did postdoctoral studies at Mass General Hospital in quantitative neuroscience.  He is currently an Associate Professor in Bioengineering at UCSD, where he is the co-director of the Center for Perinatal Health within the Institute of Engineering in Medicine.  His research has been featured on CNN, BBC, and the New York Times. In 2015, Dr. Coleman was recognized by the National Academy of Engineering as a Gilbreth Lecturer; and by TEDMED as an invited speaker.
lin_xihongXihong Lin, Harvard University
 Statisticians, Computer Scientists, and Informaticians Need Each Other for Analysis of Massive Health Data
Abstract: Massive ‘ome data, including genome, exposome, and phenome data, are becoming available at an increasing rate with no apparent end in sight. Examples include Whole Genome Sequencing data, large-scale remote-sensing satellite air pollution data,   digital phenotyping data, and Electronic Medical Records. The emerging field of Health Data Science presents biostatisticians with many research and training opportunities and challenges. Statistical inference in analyzing massive data plays a critical role.  There are countless of examples where the volume of available data requires new, scalable statistical methods and demand an investment in statistical research. These include signal detection, network analysis, integrated analysis of different types and sources of data, and incorporation of domain knowledge in health data science method development.  Success in health data science requires sound statistical inference, integrated with computer science, and information science. In this talk, I discuss some of the challenges and opportunities, and illustrate them using whole genome sequencing analysis, network analysis and Electronic Medical Record analysis.
Bio: Xihong Lin is Chair and Henry Pickering Walcott Professor of Department of Biostatistics and Coordinating  Director of the Program of Quantitative Genomics at the Harvard T. H. Chan School of Public Health, and Professor of Statistics of Faculty of Art and Science of Harvard University.  Dr. Lin’s research interests lie in development and application of statistical and computational methods for analysis of massive genetic and genomic, epidemiological, environmental, and medical data.  She currently works on whole genome sequencing association studies, genes and environment, analysis of integrated data, and statistical methods for massive health science data.   Dr. Lin received the 2002 Mortimer Spiegelman Award from the American Public Health Association and the 2006 COPSS Presidents’ Award.  She is an elected fellow of ASA, IMS, and ISI.   Dr. Lin received the MERIT Award (R37) (2007-2015), and  the Outstanding Investigator Award (OIA) (R35) (2015-2022) from the National Cancer Institute. She is the contacting PI of the Program Project (PO1) on Statistical Informatics in Cancer Research, the Analysis Center of the Genome Sequencing Program of the National Human Genome Research Institute,  and the T32 training grant on interdisciplinary training in statistical genetics and computational biology.   Dr. Lin was the former Chair of the COPSS (2010-2012) and a former member of the Committee of Applied and Theoretical Statistics (CATS) of the National Academy of Science. She is the Chair of the new ASA Section of Statistical Genetics and Genomics. She was the former Coordinating Editor of Biometrics and the founding co-editor of Statistics in Biosciences, and is currently the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics. She has served on a large number of statistical society committees, and NIH and NSF review panels.  
Jonathan Schildcrout, Vanderbilt University
Title: Modeling strategies to enrich a multiplexed, preemptive genomic testing program using electronic health records data
Abstract: The PREDICT (Pharmacogenomic Resource for Enhanced Decisions In Care and Treatment) program is a clinical quality improvement initiative at Vanderbilt University Medical Center to prospectively identify patients for genotyping based on the likelihood of receiving medications with pharmacogenetic effects at a future time.  The goal is to preemptively collect and store genetic data within patients’ electronic health records so that genetic information, if/when necessary, can be used to guide medication prescriptions.  Because medications and medication related adverse events are costly, a crucial feature of an effective multiplexed preemptive testing program is the efficient identification of who will be prescribed medications with pharmacogenetic effects.  In this talk, we will describe and then compare several time-to-event modeling strategies for identifying such patients for pre-emptive genotyping.  Once the models have been developed, weighted risk scores will be derived based on likelihood, severity and costs of each medication related adverse event.  We will compare standard Cox regression methods and machine learning approaches to evaluate strategies that minimize risk under several loss functions.
Bio: Jonathan Schildcrout, PhD, is an associate professor in the Department of Biostatistics at the Vanderbilt University School of Medicine.  His methodological research interests involve longitudinal data analysis with specific emphasis on extensions of epidemiological study designs for longitudinal data as well as the construction and evaluation of prognostic models using electronic health records data.  He has worked on a variety of collaborative research projects including: health effects of air pollution on children with asthma, evaluation and improvement of a pre-emptive genotyping program for personalizing cardiac medication treatments. Other collaborative work has focused on the mechanisms by which high body mass increases risk for kidney injury following thoracic surgery, and the role of social and environmental factors on outcomes and readmission risk in patients with acute coronary syndrome.

5 p.m. — Poster Session

The poster session will feature the research of MIDAS Affiliated Faculty, students enrolled in the Data Science Graduate Certificate Program, other U-M researchers involved in data science, and industry partners. Please note: the poster session will take place in the Michigan League.

NOVEMBER 16, 2016

 Livestream link:


8 a.m. — Registration / Coffee

8:30 a.m. — KEYNOTE: Sudip Bhattacharjee, U.S. Census Bureau

sudipTitle: Big Data and Social Benefits: Measuring and Making Society Better
Abstract: Big data brings a lot of promise, and fears. Some companies are amassing huge datasets from observations of consumer actions, and using data mining to target offers to consumers. Govt. organizations, for legal and traditional reasons, mainly rely on a different data collection methodology. But the two shall meet! There is huge benefit in combining Big Data with survey and reported data, to complement as well as supplement traditional methods. Exciting research is emerging in this area. Similarly, there are big opportunities in applying machine learning techniques to validate and code survey data.
As our enterprises and lives turn digital, and we continue to capture the digital exhaust through sensors and transactions, we can now make accurate, timely and high frequency predictions. How can we use those methods to improve our lives and society? The federal govt has an evidence-based policy making commission, and such data-driven decision making thought processes are emanating in state and local levels. Using examples from internet technology, transportation, energy, health care, politics and others, we will discuss approaches to using Big Data. We will also touch on some myths and pitfalls.
Bio: Sudip Bhattacharjee is an Associate Professor in the School of Business, University of Connecticut. He currently serves as the Chief, Center for Big Data Research and Applications, US Census Bureau. He is Visiting Faculty at EM Lyon School of Business, France, and Indian School of Business. He was a Visiting Professor at GE Global Research Center, USA. He has previously served as the Assistant Dept. Head of Operations and Information Management, and as the Executive Director of MBA Programs, both in the School of Business, University of Connecticut.  His research interests include information systems economics, energy informatics, digital goods and markets, data analytics in IT and operations, and closed loop supply chains. His research has appeared in premier journals such as Management Science, INFORMS Journal on Computing, Journal of Business, Journal of Law and Economics, ACM Transactions, Journal of Management Information Systems, IEEE Transactions, and other leading peer-reviewed publications. He serves or has served as Associate Editor for Information Systems Research (for 5 years), guest AE for MIS Quarterly and Decision Sciences Journal, and in prestigious committees such as INFORMS Edelman Award, INFORMS Selects, and various conferences and workshops. He co-chaired CIST 2014 (Conference on Information Systems and Technology), Review Coordinator, WITS 2015 (Workshop on Information Technology and Systems).
He has extensive research consulting experience with Fortune 100 firms on “Big Data” driven decision making in IT and operations. He also teaches a semester-long live data analytics graduate course in partnership with private and govt. organizations. His research has been highlighted in various media outlets such as Business Week, Washington Post, San Francisco Chronicle, Der Spiegel, Christian Science Monitor,, Business 2.0 Web Guide, and others.

9:30 a.m. — Panel: Data Science in the Social Sciences

This panel will address issues in data science in the social sciences, including political science, communications, and marketing. Each panelist will speak for 25 minutes, followed by a panel discussion.

freelon_deanDeen Freelon, American University
Title: Inferring individual-level characteristics from digital trace data: Issues and recommendations
Abstract: Digital traces—records of online activity automatically recorded by the servers that undergird all online activity—allow us to explore age-old communication research questions in unprecedented ways. But one of the greatest challenges in doing so is managing the gap between the research’s conceptual focus and the set of readily available traces. Not every type of trace will be equally valuable from a particular research standpoint, and not every interesting concept will be measurable using the traces to which researchers have access. The purpose of this presentation is to contribute to the development of a framework for assessing the construct validity of conceptual inferences drawn from digital traces. In it, I will define four platform-independent dimensions researchers should bear in mind when choosing traces for analysis: technical design, terms of service (TOS), social context, and the potential for misrepresentation. I will illustrate the value of this framework in discussions of three individual-level characteristics of broad interest to communication researchers and others: gender, race/ethnicity, and geographic location.
Bio: Deen Freelon is an associate professor in the School of Communication at American University. He has two major areas of expertise: 1) political expression through digital media, and 2) the use of code and computational methods to extract, preprocess, and analyze very large digital datasets. Freelon has authored or co-authored over 30 journal articles, book chapters, and reports, in addition to co-editing one scholarly book. His work has been funded by the Knight Foundation, the Spencer Foundation, and the US Institute of Peace. He is the creator of ReCal, an online intercoder reliability application that has been used by thousands of researchers around the world; and TSM, a network analysis module for the Python programming language.
lethamBen Letham, Core Data Science, Facebook
Title: Bayesian optimization for adaptive Internet experiments
Abstract: Field experiments are the gold standard for evaluating changes to Internet services. A typical Internet experiment compares a small number of treatment arms, such as two in the usual A/B testing paradigm. At Facebook we frequently encounter settings where the space of possible treatments is very large, or even infinite. We may wish to use field experiments to find an optimal contextual policy that maps user states to a range of actions, or to optimize the continuous parameter space of a machine learning system. I will show empirical evidence from our field experiments that these sorts of treatments can be described as vectors in a multidimensional space, and that the outcomes we care about can be modeled as smooth response surfaces. This opens the door to predicting the outcome of experiments that we haven’t yet run, which we do using Gaussian process regression. Finally, we use Bayesian optimization to efficiently explore the treatment space and propose candidates for future rounds of experimentation.
Bio: Ben Letham is a data scientist on the experimental design and causal inference team at Facebook. He researches core methods for designing and analyzing field experiments, and develops these methods into tools that can be used across the company. He joined Facebook after completing his PhD in operations research at MIT, which was preceded by degrees from Arizona State University (BSE) and Johns Hopkins University (MSE).
vargo_chrisChris Vargo, University of Colorado, Boulder
Title: External Reliability: Lessons From Traditional Content Analysis Applied to Big Data Analyses
Abstract: Traditional content analysis techniques in the social sciences have taught researchers how to measure concepts in valid ways. Computer scientists often focus on predicting outcomes, and as a result accuracy and precision are thought of as more important metrics. As big data research moves from prediction to identification and suggestion, the concept of external validity becomes more important. This talk addresses when external validity should be assessed, and proper ways in which researchers can do so.
Bio: Dr. Chris J. Vargo is an assistant professor specializing in big data and analytics. He joined the CMCI faculty in August 2016. He specializes in the use of computer science methods to investigate social data using theories from the communication, psychology and political science disciplines. Research methods of specialization include: text mining, machine learning, computer-assisted content analysis, data forecasting, information retrieval and network analysis.
He has published in the Journal of Communication, Journalism & Mass Communication Quarterly,Mass Communication & Society and Social Science Computer Review. In the classroom, Chris has taught seven different courses ranging from visual design to persuasion. These classes teach a myriad of skill sets including social media management, social psychology, video editing, graphic design, advertising design, website design, responsive design, HTML and CSS.  Chris has three degrees in Advertising & Public Relations: a PhD from The University of North Carolina at Chapel Hill, an MA from The University of Alabama and a BA from The Pennsylvania State University.
His background includes real-world public relations and digital marketing experience at SonyBMG Music, Porter Novelli and Fox/DreamWorks. In addition, Chris worked in the IT field for six years.
vavreck_lynnLynn Vavreck, UCLA, contributor to The Upshot blog, New York Times
Title: Spotcheck: A real-time ad-rating project
Abstract: Do campaign ads move voters?  The existing work suggests the effects are small and go away quite fast.  We know less about why that decay happens or what kinds of ads are most effective and memorable.  Observational methods make this kind of work difficult as candidates often target the same media markets and run the same number of ads.  In 2016, I fielded a weekly, 1,000-person randomized advertising rating project.  A focus-room in the cloud.  Every week, I exposed a randomly selected group of people to a set of ads and charted the effects in real time using analytic tools that allowed people to register whether they liked or disliked what they were seeing or hearing as they watched the ads.  I followed up with traditional survey questions.  All the results were posted to the SpotCheck website in real time.  Results reveal the challenges of doing real-time data analysis and suggest that it’s not just the ads that matter, but also the context in which they are being run.
Bio: Lynn Vavreck is a professor of political science and communication studies at UCLA and a contributing columnist to The Upshot at The New York Times. Her award-winning book, The Gamble, was described by Nate Silver as the “definitive account” of the 2012 election and political consultants on both sides of the aisle refer to her work on political messaging as “required reading.” In 2014, she hosted and interviewed Hillary Clinton at UCLA’s Luskin Lecture on Thought Leadership and in 2015 she was awarded an Andrew F. Carnegie Fellowship to investigate the influence of political advertising. Her research has been supported by the National Science Foundation and she has served on the advisory boards of both the British and American National Election Studies. At UCLA she teaches courses on campaigns, elections, and public opinion. Lynn Vavreck holds a Ph.D. in political science from the University of Rochester and held previous appointments at Princeton University, Dartmouth College, and The White House. A native of Cleveland, Ohio, she remains a loyal Browns fan and is a “known equestrian” – to draw on a phrase from the 2012 presidential campaign.

11:45 a.m. —  Refreshments, Poster Session

1:15 p.m. — Panel: Data Science in Transportation

This panel, moderated by Al Hero, will describe two projects in transportation from U-M researchers supported by MIDAS Challenge Initiative funding, and include a presentation from the Chinese carshare service Didi Chuxing. Each panelist will speak for 25 minutes; a discussion will follow.

Carol FlannaganCarol Flannagan, U-M Transportation Research Institute
Building a Transportation Data Ecosystem for Data Science Research and Applications
Abstract: This talk will describe a 3-year effort to develop a transportation data ecosystem and analytical methods that support and advance the use of very large transportation datasets to understand human behavior. Our project focuses on three broad aims: 1) Build a transportation data ecosystem on a parallel, distributed computing platform that is optimized to support a variety of Big Data analytical methods and data integration across a variety of data sources. 2) Develop Big Data analytical methods to identify “events of interest” and to separate out driver, environment and situation effects on driving. 3) Develop and implement methods of information integration. Transportation datasets come from many sources and contain different levels of information. For example, driving can occur at the micro-level (how is the vehicle moving?), the tactical level (what level of braking and steering is the driver calling for?), or the macro-level (where is the driver trying to go? Will he/she stop before turning? Is he/she distracted?)
The applications and integrated ecosystem will be designed to support these and many more analyses and applications. We plan for this to serve as a resource for the UM data science community to help solve big problems in transportation in the 21st century.
Carol A. C. Flannagan is a research associate professor in UMTRI’s Biosciences Group, and director of CMISST. She joined UMTRI in 1991 after completing her Ph.D. in mathematical and experimental psychology at the University of Michigan (U-M). She also holds an M.A. in applied statistics from U-M and a B.A. in psychology from St. Lawrence University.
Dr. Flannagan has over 20 years of experience conducting data analysis and research on injury risk related to motor vehicle crashes and was responsible for the development of a model of injury outcome that allows side-by-side comparison of public health, vehicle, roadway and post-crash interventions. She has also applied statistical methods to understanding of the potential benefits of crash-avoidance technologies, and works to develop novel applications of statistics to improve understanding of transporation safety. Dr. Flannagan’s current work with CMISST involves the fusion and analysis of large state-level crash databases, which are useful in analyzing the effect of a variety of countermeasures on crash involvement and injury risk. In addition, her group is working to make data available to researchers to expand the community of experts in transportation data analysis.
pascal-van-hentenryck-smallPascal Van Hentenryck, U-M College of Engineering
Reinventing Public Urban Transportation and Mobility
Abstract: Ubiquitous connectivity, together with significant advances in autonomous vehicles, intelligent transportation and asset management systems, have the potential to revolutionize public urban transportation and mobility in the coming decade. A new generation of public urban transportation systems can not only mitigate congestion, decrease environmental impacts, reduce costs, and improve service levels; It can also open new mobility markets for the automotive industry, bring a step change in mobility for the poor, the disabled, and the elderly, help rejuvenate inner cities and distressed neighborhoods, and bring health and social benefits that could not be envisioned until recently.
This project pushes this vision by designing novel data-driven urban transportation systems and building the descriptive, predictive, and prescriptive technologies to power them. The optimal design of these urban transportation systems will be informed by models for travel demand, accessibility, driver behavior, and transportation networks. These descriptive and predictive models will be derived by mining and fusing the rich and large data sets newly available and calibrated through interventions and machine learning. The envisioned transportation systems will be operated using real-time optimization algorithms and innovative coordinated traffic assignments to mitigate congestion, maximize network capacity utilization, and improve safety. The project will not only optimize costs, greenhouse emissions, and convenience; It will also strive to boost mobility for entire population segments and transform how to plan and manage a transportation infrastructure optimized for 20th century notions of human mobility. The project is supported by a truly multi-disciplinary team from four colleges, the University of Michigan Transportation Research Institute (UMTRI), and the CDC Injury Center. The team has a history of innovation and deployment in intelligent transportation and asset management systems. It brings significant expertise in data science, from descriptive analytics to predictive and prescriptive analytics and interventions, and in the underlying economic and social mechanisms that are critical to successful deployments.
Bio: Pascal Van Hentenryck is the Seth Bonder Collegiate Professor of Engineering at the University of Michigan.  He is a professor of Industrial and Operations Engineering, a professor of Electrical Engineering and Computer Science, and a core faculty in the Michigan Institute of Data Science. Van Hentenryck’s current research is at the intersection of optimization and data science with applications in energy, transportation, and resilience. He is a fellow of INFORMS, a fellow of AAAI, and the recipient of two honorary degrees. He was awarded the 2002 INFORMS ICS Award for research excellence in operations research and compute science, the 2006 ACP Award for research excellence in constraint programming, the 2010-2011 Philip J. Bray Award for Teaching Excellence at Brown University, and a 2013 IFORS Distinguished Speaker award. He is the author of five MIT Press books and has developed several optimization systems that are widely used in academia and industry.
Jiepeng Ye, Didi Chuxing
Big Data at Didi Chuxing
Abstract: Didi Chuxing is the largest ride-sharing company providing transportation services for over 300 million users in China. Every day, Didi’s platform generates over 70TB worth of data, processes more than 9 billion routing requests, and produces over 13 billion location points. In this talk, I will show how AI technologies including machine learning and computer vision have been applied to analyze such big transportation data to improve the travel experience for millions of people in China.
Bio: Dr. Ye is an associate director of Didi Research Institute. He is an associate professor of DCMB & EECS, University of Michigan. His research focuses on developing machine learning and data mining methods to analyze large-scale, high-dimensional, heterogeneous and complex data.

3:15 p.m. — Panel: Data Science in Learning Analytics

This panel, moderated by Henry Kelly, MIDAS industry liaison, will describe several data science projects going on at U-M in the field of learning analytics, including two projects receiving support through the MIDAS Challenge Initiatives. Each panelist will speak for 25 minutes, followed by a discussion.

aebersold_michelleMichelle Aebersold, U-M School of Nursing
Using Simulation and Debriefing to Improve Student Learning in Nursing
Abstract: Nurses are an essential member of the healthcare team and nursing students must learn to safely and competently care for patients in a healthcare setting that is increasingly complex and technologically sophisticated.  At the School of Nursing we prepare our students to navigate this work environment by using sophisticated methods of simulation including computerized mannequins, virtual environments, skills trainers and team training. We use advanced learning methods such as theory based debriefing, rapid cycle deliberate practice and competency testing to ensure our students graduate with the essential critical thinking and clinical judgment to be enter the practice environment and be successful in their first nursing position.
Dr. Aebersold’s professional and academic career is focused on advancing the science of learning applied in simulation to align clinician and student practice behaviors with research evidence to improve learner and health outcomes.  She focuses her scholarship in both high fidelity and virtual reality simulation and is a national leader and expert in simulation. Her scholarship has culminated in developing the Simulation Model to Improve Learner and Health Outcomes (SMILHO).
mihalcea_radaRada Mihalcea, U-M College of Engineering
Learning Analytics with a Personal Touch
Abstract: The reasons behind academic success are often of a personal nature: students succeed because perseverance is one of their psychological traits; positive mood and mental health are associated with higher productivity and performance; hobbies from early childhood can be indicators of future majors of choice. In a new MIDAS-funded project that spans seven different departments at the University of Michigan, we explore a new generation of data-driven tools for learning analytics that explicitly account for the personal attributes of our students: their values, beliefs, interests, behaviors, background, and emotional state. As we are about to embark on this project, I will describe recent research work undertaken in the Michigan Language and Information Technologies group under the broad umbrella of computational sociolinguistics,  where natural language processing is used to gain new insights into people’s values, behaviors, interests, and emotions.
Bio: Rada Mihalcea is a Professor in the Computer Science and Engineering department at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Research in Language in Computation, IEEE Transactions on Affective Computing, and  Transactions of the Association for Computational Linguistics. She was a program co-chair for the Conference of the Association for Computational  Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009), and a general chair for the Conference of the North American  Chapter of the Association for Computational Linguistics (2015). She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.
teasley_stephanieStephanie Teasley, U-M School of Information
Title: Uncovering Connections between Student Behavior and Academic Success: The Holistic Modeling of Education (HoME) Project
Abstract: Through the use of learning analytics, the University of Michigan has become a laboratory for generating new knowledge about learning processes and pedagogical practices. Using data drawn from the university’s learning technologies and related information available in the student data warehouse, the HOME project is focused on improving teaching and learning in higher education by synergizing current research being conducted by UM faculty in the School of Information, LSA and Engineering. The three-year project will develop a holistic model of learners that focuses on three facets of teaching and learning: 1) Understanding the relationships between learner behaviors and academic outcomes; 2) Creating a semantic knowledge base from natural language text and structured data; and 3)Supporting new evidence-based representations of learning.
By gathering and analyzing a rich variety of data from thousands of UM student activities and experiences, this project focuses on uncovering the connections between student behavior and academic success to provide instruction tailored to the specific needs of all students. This multidisciplinary effort leverages data science to directly address the next generation of grand challenges in applied data science and education.
Bio: Stephanie Teasley is a research professor at the School of Information and has been a faculty member at UMSI since 2001. She received her PhD in psychology from the University of Pittsburgh and a BA from Kalamazoo College. She is the director of the USE Lab at the University Library, whose mission is to investigate how instructional technologies and digital media are used to innovate teaching, learning, and collaboration.
She directed the doctoral program at UMSI from 2006-12. She is currently a member of the U-M Coursera Advisory Group and the Learning Analytics Task Force and serves on the Executive Board of the Society for Learning Analytics Research (SoLAR).

5 p.m. — Closing