Edgar Franco-Vivanco

By |

Edgar Franco-Vivanco is an Assistant Professor of Political Science and a faculty associate at the Center for Political Studies. His research interests include Latin American politics, historical political economy, criminal violence, and indigenous politics.

Prof. Franco-Vivanco is interested in implementing machine learning tools to improve the analysis of historical data, in particular handwritten documents. He is also working in the application of text analysis to study indigenous languages. In a parallel research agenda, he explores how marginalized communities interact with criminal organizations and abusive policing in Latin America. As part of this research, he is using NLP tools to identify different types of criminal behavior.

Examples of the digitization process of handwritten documents from colonial Mexico.

Yixin Wang

By |

Yixin Wang works in the fields of Bayesian statistics, machine learning, and causal inference, with applications to recommender systems, text data, and genetics. She also works on algorithmic fairness and reinforcement learning, often via connections to causality. Her research centers around developing practical and trustworthy machine learning algorithms for large datasets that can enhance scientific understandings and inform daily decision-making. Her research interests lie in the intersection of theory and applications.

Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Marie O’Neill

By |

My research interests include health effects of air pollution, temperature extremes and climate change (mortality, asthma, hospital admissions, birth outcomes and cardiovascular endpoints); environmental exposure assessment; and socio-economic influences on health.
Data science tools and methodologies include geographic information systems and spatio-temporal analysis, epidemiologic study design and data management.

Carina Gronlund

By |

As an environmental epidemiologist and in collaboration with government and community partners, I study how social, economic, health, and built environment characteristics and/or air quality affect vulnerability to extreme heat and extreme precipitation. This research will help cities understand how to adapt to heat, heat waves, higher pollen levels, and heavy rainfall in a changing climate.

Daniel P. Keating

By |

The primary tools currently in use are variations of linear models (regression, MLM, SEM, and so on) as we pursue the initial aims of the NICHD funded work. We are expanding into new areas that require new tools. Our adolescent sample is diverse, selected through quota sampling of high schools close enough to UM to afford the use of neuroimaging tools, but it is not population representative. To overcome this, we have begun work to calibrate our sample with the nationally representative Monitoring the Future study, implementing pseudo-weighting and multilevel regression and post-stratification. To enable much more powerful analyses, we are aiming toward the harmonization of multiple, high quality longitudinal databases from adolescence through early adulthood. This would benefit traditional analyses by allowing cross-validation with high power, but also provide opportunities for newer data science tools such as computational modeling and machine learning approaches.

Kevin Stange

By |

Prof. Stange’s research uses population administrative education and labor market data to understand, evaluate and improve education, employment, and economic policy. Much of the work involves analyzing millions of course-taking and transcript records for college students, whether they be at a single institution, a handful of institutions, or all institutions in several states. This data is used to richly characterize the experiences of college students and relate these experiences to outcomes such as educational attainment, employment, earnings, and career trajectories. Several projects also involve working with the text contained in the universe of all job ads posted online in the US for the past decade. This data is used to characterize the demand for different skills and education credentials in the US labor market. Classification is a task that is arising frequently in this work: How to classify courses into groups based on their title and content? How to identify students with similar educational experiences based on their course-taking patterns? How to classify job ads as being more appropriate for one type of college major or another? This data science work is often paired with traditional causal inference tools of economics, including quasi-experimental methods.

Joyce Penner

By |

I am new to researching in Artificial Intelligence used in Atmospheric Sciences. Previous experience is in comparing satellite data products with 3-D global simulations.

Nicholas Henderson

By |

My research primarily focuses on the following main themes: 1) development of methods for risk prediction and analyzing treatment effect heterogeneity, 2) Bayesian nonparametrics and Bayesian machine learning methods with a particular emphasis on the use of these methods in the context of survival analysis, 3) statistical methods for analyzing heterogeneity in risk-benefit profiles and for supporting individualized treatment decisions, and 4) development of empirical Bayes and shrinkage methods for high-dimensional statistical applications. I am also broadly interested in collaborative work in biomedical research with a focus on the application of statistics in cancer research.