Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Lia Corrales

By |

My PhD research focused on identifying the size and mineralogical composition of interstellar dust through X-ray imaging of dust scattering halos to X-ray spectroscopy of bright objects to study absorption from intervening material. Over the course of my PhD I also developed an open source, object oriented approach to computing extinction properties of particles in Python that allows the user to change the scattering physics models and composition properties of dust grains very easily. In many cases, the signal I look for from interstellar dust requires evaluating the observational data on the 1-5% level. This has required me to develop a deep understanding of both the instrument and the counting statistics (because modern-day X-ray instruments are photon counting tools). My expertise led me to a postdoc at MIT, where I developed techniques to obtain high resolution X-ray spectra from low surface brightness (high background) sources imaged with the Chandra X-ray Observatory High Energy Transmission Grating Spectrometer. I pioneered these techniques in order to extract and analyze the high resolution spectrum of Sgr A*, our Galaxy’s central supermassive black hole (SMBH), producing a legacy dataset with a precision that will not be replaceable for decades. This dataset will be used to understand why Sgr A* is anomalously inactive, giving us clues to the connection between SMBH activity and galactic evolution. In order to publish the work, I developed an open source software package, pyXsis (github.com/eblur/pyxsis) in order to model the low signal-to-noise spectrum of Sgr A* simultaneously with a non-physical parameteric model of the background spectrum (Corrales et al., 2020). As a result of my vocal advocacy for Python compatible software tools and a modular approach to X-ray data analysis, I became Chair for HEACIT (which stands for “High Energy Astrophysics Codes, Interfaces, and Tools”), a new self-appointed working group of X-ray software engineers and early career scientists interested in developing tools for future X-ray observatories. We are working to identify science cases that high energy astronomers find difficult to support with the current software libraries, provide a central and publicly available online forum for tutorials and discussion of current software libraries, and develop a set of best practices for X-ray data analysis. My research focus is now turning to exoplanet atmospheres, where I hope to measure absorption from molecules and aerosols in the UV. Utilizing UM access to the Neil Gehrels Swift Observatory, I work to observe the dip in a star’s brightness caused by occultation (transit) from a foreground planet. Transit depths are typically <1%, and telescopes like Swift were not originally designed with transit measurements (i.e., this level of precision) in mind. As a result, this research strongly depends on robust methods of scientific inference from noisy datasets.


As a graduate student, I attended some of the early “Python in Astronomy” workshops. While there, I wrote Jupyter Notebook tutorials that helped launch the Astropy Tutorials project (github.com/astropy/astropy-tutorials), which expanded to Learn Astropy (learn.astropy.org), for which I am a lead developer. Since then, I have also become a leader within the larger Astropy collaboration. I have helped develop the Astropy Project governance structure, hired maintainers, organized workshops, and maintained an AAS presence for the Astropy Project and NumFocus (the non-profit umbrella organization that works to sustain open source software communities in scientific computing) for the last several years. As a woman of color in a STEM field, I work to clear a path by teaching the skills I have learned along the way to other underrepresented groups in STEM. This year I piloted WoCCode (Women of Color Code), an online network and webinar series for women from minoritized backgrounds to share expertise and support each other in contributing to open source software communities.

Kevin Bakker

By |

Kevin’s research is focused on to identifying and interpreting the mechanisms responsible for the complex dynamics we observe in ecological and epidemiological systems using data science and modeling approaches. He is primarily interested in emerging and endemic pathogens, such as SARS-CoV-2, influenza, vampire bat rabies, and a host of childhood infectious diseases such as chickenpox. He uses statistical and mechanistic models to fit, forecast, and occasionally back-cast expected disease dynamics under a host of conditions, such as vaccination or other control mechanisms.

Andrew Brouwer

By |

Andrew uses mathematical and statistical modeling to address public health problems. As a mathematical epidemiologist, he works on a wide range of topics (mostly related to infectious diseases and cancer prevention and survival) using an array of computational and statistical tools, including mechanistic differential equations and multistate stochastic processes. Rigorous consideration of parameter identifiability, parameter estimation, and uncertainty quantification are underlying themes in Andrew’s work.

Nicholas Henderson

By |

My research primarily focuses on the following main themes: 1) development of methods for risk prediction and analyzing treatment effect heterogeneity, 2) Bayesian nonparametrics and Bayesian machine learning methods with a particular emphasis on the use of these methods in the context of survival analysis, 3) statistical methods for analyzing heterogeneity in risk-benefit profiles and for supporting individualized treatment decisions, and 4) development of empirical Bayes and shrinkage methods for high-dimensional statistical applications. I am also broadly interested in collaborative work in biomedical research with a focus on the application of statistics in cancer research.

Mithun Chakraborty

By |

My broad research interests are in multi-agent systems, computational economics and finance, and artificial intelligence. I apply techniques from algorithmic game theory, statistical machine learning, decision theory, etc. to a variety of problems at the intersection of the computational and social sciences. A major focus of my research has been the design and analysis of market-making algorithms for financial markets and, in particular, prediction markets — incentive-based mechanisms for aggregating data in the form of private beliefs about uncertain events (e.g. the outcome of an election) distributed among strategic agents. I use both analytical and simulation-based methods to investigate the impact of factors such as wealth, risk attitude, manipulative behavior, etc. on information aggregation in market ecosystems. Another line of work I am pursuing involves algorithms for allocating resources based on preference data collected from potential recipients, satisfying efficiency, fairness, and diversity criteria; my joint work on ethnicity quotas in Singapore public housing allocation deserves special mention in this vein. More recently, I have got involved in research on empirical game-theoretic analysis, a family of methods for building tractable models of complex, procedurally defined games from empirical/simulated payoff data and using them to reason about game outcomes.

Catherine Hausman

By |

Catherine H. Hausman is an Associate Professor in the School of Public Policy and a Research Associate at the National Bureau of Economic Research. She uses causal inference, related statistical methods, and microeconomic modeling to answer questions at the intersection of energy markets, environmental quality, climate change, and public policy.

Recent projects have looked at inequality and environmental quality, the natural gas sector’s role in methane leaks, the impact of climate change on the electricity grid, and the effects of nuclear power plant closures. Her research has appeared in the American Economic Journal: Applied Economics, the American Economic Journal: Economic Policy, the Brookings Papers on Economic Activity, and the Proceedings of the National Academy of Sciences.

Ranjan Pal

By |

Cyber-security is a complex and multi-dimensional research field. My research style comprises an inter-disciplinary (primarily rooted in economics, econometrics, data science (AI/ML/Bayesian and Frequentist Statistics), game theory, and network science) investigation of major socially pressing issues impacting the quality of cyber-risk management in modern networked and distributed engineering systems such as IoT-driven critical infrastructures, cloud-based service networks, and app-based systems (e.g., mobile commerce, smart homes) to name a few. I take delight in proposing data-driven, rigorous, and interdisciplinary solutions to both, existing fundamental challenges that pose a practical bottleneck to (cost) effective cyber-risk management, and futuristic cyber-security and privacy issues that might plague modern (networked) engineering systems. I strongly strive for originality, practical significance, and mathematical rigor in my solutions. One of my primary end goals is to conceptually get arms around complex, multi-dimensional information security and privacy problems in a way that helps, informs, and empowers practitioners and policy makers to take the right steps in making the cyber-space more secure.

Maureen Sartor

By |

My lab has two main areas of focus: molecular characteristics of head and neck cancer, and the intersection of regulatory genomics and pathway analysis. With head and neck cancer, we study tumor subtypes and biomarkers of prognosis, treatment response, and recurrence. We perform integrative omics analyses, dimension reduction methods, and prediction techniques, with the ultimate goal of identifying patient subsets who would benefit from either an additional targeted treatment or de-escalated treatment to increase quality of life. For regulatory genomics and pathway analysis, we develop statistical tests taking into account important covariates and other variables for weighting observations.