Susan Hautaniemi Leonard

By |

I am faculty at ICPSR, the largest social science data archive in the world. I manage an education research pre-registration site ( that is focused on transparency and replicability. I also manage a site for sharing work around record linkage, including code ( I am involved in the LIFE-M project (, recently classifying the mortality data. That project uses cutting-edge techniques for machine-reading handwritten forms.

Mortality rates for selected causes in the total population per 1,000, 1850–1912, Holyoke and Northampton, Massachusetts

Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Jodyn Platt

By |

Our team leads research on the Ethical, Legal, and Social Implications (ELSI) of learning health systems and related enterprises. Our research uses mixed methods to understand policies and practices that make data science methods (data collection and curation, AI, computable algorithms) trustworthy for patients, providers, and the public. Our work engages multiple stakeholders including providers and health systems, as well as the general public and minoritized communities on issues such as AI-enabled clinical decision support, data sharing and privacy, and consent for data use in precision oncology.

J. Alex Halderman

By |

My research focuses on computer security and privacy, with an emphasis on problems that broadly impact society and public policy. Topics that interest me include software security, network security, data privacy, anonymity, election cybersecurity, censorship resistance, computer forensics, ethics, and cybercrime. I’m also interested in the interaction of technology with politics and international affairs.

Ranjan Pal

By |

Cyber-security is a complex and multi-dimensional research field. My research style comprises an inter-disciplinary (primarily rooted in economics, econometrics, data science (AI/ML/Bayesian and Frequentist Statistics), game theory, and network science) investigation of major socially pressing issues impacting the quality of cyber-risk management in modern networked and distributed engineering systems such as IoT-driven critical infrastructures, cloud-based service networks, and app-based systems (e.g., mobile commerce, smart homes) to name a few. I take delight in proposing data-driven, rigorous, and interdisciplinary solutions to both, existing fundamental challenges that pose a practical bottleneck to (cost) effective cyber-risk management, and futuristic cyber-security and privacy issues that might plague modern (networked) engineering systems. I strongly strive for originality, practical significance, and mathematical rigor in my solutions. One of my primary end goals is to conceptually get arms around complex, multi-dimensional information security and privacy problems in a way that helps, informs, and empowers practitioners and policy makers to take the right steps in making the cyber-space more secure.

Xu Shi

By |

My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.

Lucia Cevidanes

By |

We have developed and tested machine learning approaches to integrate quantitative markers for diagnosis and assessment of progression of TMJ OA, as well as extended the capabilities of 3D Slicer4 into web-based tools and disseminated open source image analysis tools. Our aims use data processing and in-depth analytics combined with learning using privileged information, integrated feature selection, and testing the performance of longitudinal risk predictors. Our long term goals are to improve diagnosis and risk prediction of TemporoMandibular Osteoarthritis in future multicenter studies.

The Spectrum of Data Science for Diagnosis of Osteoarthritis of the Temporomandibular Joint

Amy Pienta

By |

My research at ICPSR is developing ingest and curation workflows for new data types (including EEG) to ensure these data are Findable, Accessible, Interoperable, and Reusable (FAIR) within data repositories.

My funded projects and programs:
National Addiction and HIV Data Archive Program (NAHDAP) funded by the National Institute on Drug Abuse (NIDA)
Health and Medical Care Archive funded by Robert Wood Johnson Foundation (RWJF)
Archive of Data on Disability to Enhance Policy and research (ADDEP) funded by NIH