Edgar Franco-Vivanco

By |

Edgar Franco-Vivanco is an Assistant Professor of Political Science and a faculty associate at the Center for Political Studies. His research interests include Latin American politics, historical political economy, criminal violence, and indigenous politics.

Prof. Franco-Vivanco is interested in implementing machine learning tools to improve the analysis of historical data, in particular handwritten documents. He is also working in the application of text analysis to study indigenous languages. In a parallel research agenda, he explores how marginalized communities interact with criminal organizations and abusive policing in Latin America. As part of this research, he is using NLP tools to identify different types of criminal behavior.

Examples of the digitization process of handwritten documents from colonial Mexico.

Benjamin Fish

By |

My research tackles how human values can be incorporated into machine learning and other computational systems. This includes work on the translation process from human values to computational definitions and work on how to understand and encourage fairness while preventing discrimination in machine learning and data science. My research combines tools from the theory of machine learning with insights from economics, science and technology studies, and philosophy, among others, to improve our theories of the translation process and the algorithms we create. In settings like classification, social networks, and data markets, I explore the ways in which human values play a primary role in the quality of machine learning and data science.

The likelihood of receiving desirable information like public health information or job advertisements depends on both your position in a social network, and on who directly gets the information to start with (the seeds). This image shows how a new method for deciding who to select as the seeds, called maximin, outperforms the most popular approach in previous literature by decreasing the correlation between where you are in the social network and your likelihood of receiving the information. These figures are taken from work by Benjamin Fish, Ashkan Bashardoust, danah boyd, Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. Gaps in information access in social networks. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, pages 480–490, 2019.

Yixin Wang

By |

Yixin Wang works in the fields of Bayesian statistics, machine learning, and causal inference, with applications to recommender systems, text data, and genetics. She also works on algorithmic fairness and reinforcement learning, often via connections to causality. Her research centers around developing practical and trustworthy machine learning algorithms for large datasets that can enhance scientific understandings and inform daily decision-making. Her research interests lie in the intersection of theory and applications.

Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Lia Corrales

By |

My PhD research focused on identifying the size and mineralogical composition of interstellar dust through X-ray imaging of dust scattering halos to X-ray spectroscopy of bright objects to study absorption from intervening material. Over the course of my PhD I also developed an open source, object oriented approach to computing extinction properties of particles in Python that allows the user to change the scattering physics models and composition properties of dust grains very easily. In many cases, the signal I look for from interstellar dust requires evaluating the observational data on the 1-5% level. This has required me to develop a deep understanding of both the instrument and the counting statistics (because modern-day X-ray instruments are photon counting tools). My expertise led me to a postdoc at MIT, where I developed techniques to obtain high resolution X-ray spectra from low surface brightness (high background) sources imaged with the Chandra X-ray Observatory High Energy Transmission Grating Spectrometer. I pioneered these techniques in order to extract and analyze the high resolution spectrum of Sgr A*, our Galaxy’s central supermassive black hole (SMBH), producing a legacy dataset with a precision that will not be replaceable for decades. This dataset will be used to understand why Sgr A* is anomalously inactive, giving us clues to the connection between SMBH activity and galactic evolution. In order to publish the work, I developed an open source software package, pyXsis (github.com/eblur/pyxsis) in order to model the low signal-to-noise spectrum of Sgr A* simultaneously with a non-physical parameteric model of the background spectrum (Corrales et al., 2020). As a result of my vocal advocacy for Python compatible software tools and a modular approach to X-ray data analysis, I became Chair for HEACIT (which stands for “High Energy Astrophysics Codes, Interfaces, and Tools”), a new self-appointed working group of X-ray software engineers and early career scientists interested in developing tools for future X-ray observatories. We are working to identify science cases that high energy astronomers find difficult to support with the current software libraries, provide a central and publicly available online forum for tutorials and discussion of current software libraries, and develop a set of best practices for X-ray data analysis. My research focus is now turning to exoplanet atmospheres, where I hope to measure absorption from molecules and aerosols in the UV. Utilizing UM access to the Neil Gehrels Swift Observatory, I work to observe the dip in a star’s brightness caused by occultation (transit) from a foreground planet. Transit depths are typically <1%, and telescopes like Swift were not originally designed with transit measurements (i.e., this level of precision) in mind. As a result, this research strongly depends on robust methods of scientific inference from noisy datasets.


As a graduate student, I attended some of the early “Python in Astronomy” workshops. While there, I wrote Jupyter Notebook tutorials that helped launch the Astropy Tutorials project (github.com/astropy/astropy-tutorials), which expanded to Learn Astropy (learn.astropy.org), for which I am a lead developer. Since then, I have also become a leader within the larger Astropy collaboration. I have helped develop the Astropy Project governance structure, hired maintainers, organized workshops, and maintained an AAS presence for the Astropy Project and NumFocus (the non-profit umbrella organization that works to sustain open source software communities in scientific computing) for the last several years. As a woman of color in a STEM field, I work to clear a path by teaching the skills I have learned along the way to other underrepresented groups in STEM. This year I piloted WoCCode (Women of Color Code), an online network and webinar series for women from minoritized backgrounds to share expertise and support each other in contributing to open source software communities.

Jodyn Platt

By |

Our team leads research on the Ethical, Legal, and Social Implications (ELSI) of learning health systems and related enterprises. Our research uses mixed methods to understand policies and practices that make data science methods (data collection and curation, AI, computable algorithms) trustworthy for patients, providers, and the public. Our work engages multiple stakeholders including providers and health systems, as well as the general public and minoritized communities on issues such as AI-enabled clinical decision support, data sharing and privacy, and consent for data use in precision oncology.

Ben Green

By |

Ben studies the social and political impacts of government algorithms. This work falls into several categories. First, evaluating how people make decisions in collaboration with algorithms. This work involves developing machine learning algorithms and studying how people use them in public sector prediction and decision settings. Second, studying the ethical and political implications of government algorithms. Much of this work draws on STS and legal theory to interrogate topics such as algorithmic fairness, smart cities, and criminal justice risk assessments. Third, developing algorithms for public sector applications. In addition to academic research, Ben spent a year developing data analytics tools as a data scientist for the City of Boston.

Ayumi Fujisaki-Manome

By |

Fujisaki-Manome’s research program aims to improve predictability of hazardous weather, ice, and lake/ocean events in cold regions in order to support preparedness and resilience in coastal communities, as well as improve the usability of their forecast products by working with stakeholders. The main question Fujisaki-Manome’s research aims to address is: what are the impacts of interactions between ice and oceans / ice and lakes on larger scale phenomena, such as climate, weather, storm surges, and sea/lake ice melting? Fujisaki-Manome primarily uses numerical geophysical modeling and machine learning to address the research question; and scientific findings from the research feed back into the models and improve their predictability. Her work has focused on applications to the Great Lakes, the Alaska’s coasts, Arctic Ocean, and the Sea of Okhotsk.

View MIDAS Faculty Research Pitch, Fall 2021

Areal fraction of ice cover in the Great Lakes in January 2018 modeled by the unstructured grid ice-hydrodynamic numerical model.

J. Alex Halderman

By |

My research focuses on computer security and privacy, with an emphasis on problems that broadly impact society and public policy. Topics that interest me include software security, network security, data privacy, anonymity, election cybersecurity, censorship resistance, computer forensics, ethics, and cybercrime. I’m also interested in the interaction of technology with politics and international affairs.