Susan Hautaniemi Leonard

By |

I am faculty at ICPSR, the largest social science data archive in the world. I manage an education research pre-registration site ( that is focused on transparency and replicability. I also manage a site for sharing work around record linkage, including code ( I am involved in the LIFE-M project (, recently classifying the mortality data. That project uses cutting-edge techniques for machine-reading handwritten forms.

Mortality rates for selected causes in the total population per 1,000, 1850–1912, Holyoke and Northampton, Massachusetts

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Michael Rubyan

By |

My research focuses on the development and evaluation of novel interventions that leverage emerging technologies to train members of the healthcare workforce around adhering to guidelines. I study how to scale custom designed teaching and learning platforms and evaluate their use to motivate effective communication and dissemination of evidence based practice. Other emphases of my work include health policy literacy, translation and communication of health services research, and improving health system literacy in urban communities. I have developed and evaluated numerous web based educational interventions that employ the “flipped classroom” design with an emphasis on understanding the data and analytics that guide successful implementation and promote high fidelity for members of the healthcare workforce. As an implementation scientist, I rely on the integration of data and analytics to understand what motivates successful program implementation.

In addition to the development of these platforms, I have extensive experience developing and evaluating online, hybrid residential, residential courses, and MOOCs related to healthcare management, non-profit management, healthcare finance, and health economics that employ engaging lessons and modules, interactive graphics, and a blended learning format to aid health professions students, and both undergraduate and graduate public health students in understanding the healthcare system. My MOOC entitled “Understanding and Improving the U.S. Health Care System” has been taken by over 5,000 learners and is characterized by the use of “big data” to understand how future healthcare providers learn health policy.

Kevin Stange

By |

Prof. Stange’s research uses population administrative education and labor market data to understand, evaluate and improve education, employment, and economic policy. Much of the work involves analyzing millions of course-taking and transcript records for college students, whether they be at a single institution, a handful of institutions, or all institutions in several states. This data is used to richly characterize the experiences of college students and relate these experiences to outcomes such as educational attainment, employment, earnings, and career trajectories. Several projects also involve working with the text contained in the universe of all job ads posted online in the US for the past decade. This data is used to characterize the demand for different skills and education credentials in the US labor market. Classification is a task that is arising frequently in this work: How to classify courses into groups based on their title and content? How to identify students with similar educational experiences based on their course-taking patterns? How to classify job ads as being more appropriate for one type of college major or another? This data science work is often paired with traditional causal inference tools of economics, including quasi-experimental methods.

Quan Nguyen

By |

My research focuses on the application of data science in educational research, so called learning analytics. I have experience analyzing educational data on a large-scale to understand a) how course design influence students’ learning behavior and b) how students form peer networks. My work involves using multiple educational data sources such as log-data in online learning environment, course information, students’ academic records, and location data gathered from campus WiFi networks. I am interested in network analysis, time-series analysis, and machine learning.

Akbar Waljee

By |

I use machine-learning techniques to implement decision support systems and tools that facilitate more personalized care for disease management and healthcare utilization to ultimately deliver efficient, effective, and equitable therapy for chronic diseases. To test and advance these general principles, I have built operational programs that are guiding—and improving—patient care in costly in low resource settings, including emerging countries.

Andrew Krumm

By |

My research examines the ways in which individuals and organizations use data to improve. Quality improvement and data-intensive research approaches are central to my work along with forming equitable collaborations between researchers and frontline workers. Prior to joining the Department of Learning Health Sciences, I was the Director of Learning Analytics Research at Digital Promise and a Senior Education Researcher in the Center for Technology in Learning at SRI International. At both organizations, I developed data-intensive research-practice partnerships with educational organizations of all types. As a learning scientist working at the intersection of data-intensive research and quality improvement, my colleagues and I have developed tools and strategies (e.g., cloud-based, open source tools for engaging in collaborative exploratory data analyses) that partnerships between researchers and practitioners can use to measure learning and improve learning environments.

This is an image that my colleagues and I, over multiple projects, developed to communicate the multiple steps involved in collaborative data-intensive improvement. The “organize” and “understand” phases are about asking the right questions before the work of data analysis begins: “co-develop” and “test” are about taking action following an analysis. Along with identifying common phases, we have also observed the importance of the following supporting conditions: a trusting partnership, the use of formal improvement methods, common data workflows, and intentional efforts to support the learning of everyone involved in the project.

Vitaliy Popov

By |

My research focuses on understanding, designing, and evaluating learning technologies and environments that foster collaborative problem solving, spatial reasoning, engineering design thinking and agency. I am particularly interested in applying multimodal learning analytics in the context of co-located and/or virtually distributed teams in clinical simulations. I strive to utilize evidence in education science, simulation-based training and learning analytics to understand how people become expert health professionals, how they can better work in teams and how we can support these processes to foster health care delivery and health outcomes.

Sol Bermann

By |

I am interested in the intersection of big data, data science, privacy, security, public policy, and law. At U-M, this includes co-convening the Dissonance Event Series, a multi-disciplinary collaboration of faculty and graduate students that explore the confluence of technology, policy, privacy, security, and law. I frequently guest lecture on these subject across campus, including at the School of Information, Ford School of Public Policy, and the Law School.