Briana Mezuk

By |

My research program uses epidemiologic methods to examine the interrelationships between mental and physical health over the lifespan. A core feature of my research is the integration of conceptual and analytical approaches, methods, and models from social science, including natural language processing, and clinical/health disciplines with the aim of arriving at a more nuanced and comprehensive understanding of the ways in which mental and physical health interrelate. The goal of this work is to inform interventions that reflect an integrative approach to health.

Wenbo Sun

By |

Uncertainty quantification and decision making are increasingly demanded with the development of future technology in engineering and transportation systems. Among the uncertainty quantification problems, Dr. Wenbo Sun is particularly interested in statistical modelling of engineering system responses with considering the high dimensionality and complicated correlation structure, as well as quantifying the uncertainty from a variety of sources simultaneously, such as the inexactness of large-scale computer experiments, process variations, and measurement noises. He is also interested in data-driven decision making that is robust to the uncertainty. Specifically, he delivers methodologies for anomaly detection and system design optimization, which can be applied to manufacturing process monitoring, distracted driving detection, out-of-distribution object identification, vehicle safety design optimization, etc.

J.J. Prescott

By |

Broadly, I study legal decision making, including decisions related to crime and employment. I typically use large social science data bases, but also collect my own data using technology or surveys.

Edgar Franco-Vivanco

By |

Edgar Franco-Vivanco is an Assistant Professor of Political Science and a faculty associate at the Center for Political Studies. His research interests include Latin American politics, historical political economy, criminal violence, and indigenous politics.

Prof. Franco-Vivanco is interested in implementing machine learning tools to improve the analysis of historical data, in particular handwritten documents. He is also working in the application of text analysis to study indigenous languages. In a parallel research agenda, he explores how marginalized communities interact with criminal organizations and abusive policing in Latin America. As part of this research, he is using NLP tools to identify different types of criminal behavior.

Examples of the digitization process of handwritten documents from colonial Mexico.

Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Lia Corrales

By |

My PhD research focused on identifying the size and mineralogical composition of interstellar dust through X-ray imaging of dust scattering halos to X-ray spectroscopy of bright objects to study absorption from intervening material. Over the course of my PhD I also developed an open source, object oriented approach to computing extinction properties of particles in Python that allows the user to change the scattering physics models and composition properties of dust grains very easily. In many cases, the signal I look for from interstellar dust requires evaluating the observational data on the 1-5% level. This has required me to develop a deep understanding of both the instrument and the counting statistics (because modern-day X-ray instruments are photon counting tools). My expertise led me to a postdoc at MIT, where I developed techniques to obtain high resolution X-ray spectra from low surface brightness (high background) sources imaged with the Chandra X-ray Observatory High Energy Transmission Grating Spectrometer. I pioneered these techniques in order to extract and analyze the high resolution spectrum of Sgr A*, our Galaxy’s central supermassive black hole (SMBH), producing a legacy dataset with a precision that will not be replaceable for decades. This dataset will be used to understand why Sgr A* is anomalously inactive, giving us clues to the connection between SMBH activity and galactic evolution. In order to publish the work, I developed an open source software package, pyXsis ( in order to model the low signal-to-noise spectrum of Sgr A* simultaneously with a non-physical parameteric model of the background spectrum (Corrales et al., 2020). As a result of my vocal advocacy for Python compatible software tools and a modular approach to X-ray data analysis, I became Chair for HEACIT (which stands for “High Energy Astrophysics Codes, Interfaces, and Tools”), a new self-appointed working group of X-ray software engineers and early career scientists interested in developing tools for future X-ray observatories. We are working to identify science cases that high energy astronomers find difficult to support with the current software libraries, provide a central and publicly available online forum for tutorials and discussion of current software libraries, and develop a set of best practices for X-ray data analysis. My research focus is now turning to exoplanet atmospheres, where I hope to measure absorption from molecules and aerosols in the UV. Utilizing UM access to the Neil Gehrels Swift Observatory, I work to observe the dip in a star’s brightness caused by occultation (transit) from a foreground planet. Transit depths are typically <1%, and telescopes like Swift were not originally designed with transit measurements (i.e., this level of precision) in mind. As a result, this research strongly depends on robust methods of scientific inference from noisy datasets.


As a graduate student, I attended some of the early “Python in Astronomy” workshops. While there, I wrote Jupyter Notebook tutorials that helped launch the Astropy Tutorials project (, which expanded to Learn Astropy (, for which I am a lead developer. Since then, I have also become a leader within the larger Astropy collaboration. I have helped develop the Astropy Project governance structure, hired maintainers, organized workshops, and maintained an AAS presence for the Astropy Project and NumFocus (the non-profit umbrella organization that works to sustain open source software communities in scientific computing) for the last several years. As a woman of color in a STEM field, I work to clear a path by teaching the skills I have learned along the way to other underrepresented groups in STEM. This year I piloted WoCCode (Women of Color Code), an online network and webinar series for women from minoritized backgrounds to share expertise and support each other in contributing to open source software communities.

Kevin Bakker

By |

Kevin’s research is focused on to identifying and interpreting the mechanisms responsible for the complex dynamics we observe in ecological and epidemiological systems using data science and modeling approaches. He is primarily interested in emerging and endemic pathogens, such as SARS-CoV-2, influenza, vampire bat rabies, and a host of childhood infectious diseases such as chickenpox. He uses statistical and mechanistic models to fit, forecast, and occasionally back-cast expected disease dynamics under a host of conditions, such as vaccination or other control mechanisms.

Andrew Brouwer

By |

Andrew uses mathematical and statistical modeling to address public health problems. As a mathematical epidemiologist, he works on a wide range of topics (mostly related to infectious diseases and cancer prevention and survival) using an array of computational and statistical tools, including mechanistic differential equations and multistate stochastic processes. Rigorous consideration of parameter identifiability, parameter estimation, and uncertainty quantification are underlying themes in Andrew’s work.