Susan Hautaniemi Leonard

By |

I am faculty at ICPSR, the largest social science data archive in the world. I manage an education research pre-registration site ( that is focused on transparency and replicability. I also manage a site for sharing work around record linkage, including code ( I am involved in the LIFE-M project (, recently classifying the mortality data. That project uses cutting-edge techniques for machine-reading handwritten forms.

Mortality rates for selected causes in the total population per 1,000, 1850–1912, Holyoke and Northampton, Massachusetts

Thomas L. Chenevert

By |

Multi-center clinical trials increasingly utilize quantitative diffusion imaging (DWI) to aid in patient management and treatment response assessment for translational oncology applications. A major source of systematic bias in diffusion was discovered originating from platform-dependent gradient hardware. Left uncorrected, these biases confound quantitative diffusion metrics used for characterization of tissue pathology and treatment response leading to inconclusive findings, and increasing the requisite subject numbers and trial cost. We have developed technology to mitigate systematic diffusion mapping bias that exists on MRI scanners and are in process of deploying this technology for multi-center clinical trials. Another major source of variance and bottleneck in high-throughput analysis of quantitative diffusion maps is segmentation of tumor/tissue volume of interest (VOI) based on intensities and patterns on multi-contrast MR image datasets, as well as reliable assessment of longitudinal change with disease progression or response to treatment. Our goal is development/trial/application AI algorithms for robust (semi-) automated VOI definition in analysis of multi-dimensional MR datasets for oncology trials.

Representative apparent diffusion coefficient (ADC) histograms and map overlays are shown for oncology trials to be supported by this Academic Industrial Partnership (AIP). ADC is used to characterize tumor malignancy of breast cancer, therapeutic effect in head and neck (H&N) and cellular proliferation in bone marrow of myelofibrosis (MF) patients. Relevant clinical outcome metrics are illustrated under histograms for detection sensitivity threshold (to reduce unnecessary breast biopsies (13)), Kaplan-Meier analysis of therapy response (stratified by median SD of H&N metastatic node (23)), and histopathologic proliferation stage (MF cellular infiltration classification).

Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021


Sara Lafia

By |

I am a Research Fellow in the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. My research is currently supported by a NSF project, Developing Evidence-based Data Sharing and Archiving Policies, where I am analyzing curation activities, automatically detecting data citations, and contributing to metrics for tracking the impact of data reuse. I hold a Ph.D. in Geography from UC Santa Barbara and I have expertise in GIScience, spatial information science, and urban planning. My interests also include the Semantic Web, innovative GIS education, and the science of science. I have experience deploying geospatial applications, designing linked data models, and developing visualizations to support data discovery.

Lana Garmire

By |

My research interest lies in applying data science for actionable transformation of human health from the bench to bedside. Current research focus areas include cutting edge single-cell sequencing informatics and genomics; precision medicine through integration of multi-omics data types; novel modeling and computational methods for biomarker research; public health genomics. I apply my biomedical informatics and analytical expertise to study diseases such as cancers, as well the impact of pregnancy/early life complications on later life diseases.

Xu Shi

By |

My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.

Robert Ploutz-Snyder

By |

My work falls into three general application areas. I am an applied (accredited) biostatistician with a strong team science motivation and I collaborate with scientists in primarily the biomedical sciences, contributing expertise in experimental design, statistical analysis/modeling, and data visualization. I have held faculty appointments in Schools of Medicine and Nursing, and also worked as a senior scientist in the Human Research Program at the NASA Johnson Space Center. I currently direct an Applied Biostatistics Laboratory and Data Management Core within the UM School of Nursing, and maintain several collaborative research programs within the School, at NASA, and with collaborators elsewhere.

Sunghee Lee

By |

My research focuses on issues in data collection with hard-to-reach populations. In particular, she examines 1) nontraditional sampling approaches for minority or stigmatized populations and their statistical properties and 2) measurement error and comparability issues for racial, ethnic and linguistic minorities, which also have implications for cross-cultural research/survey methodology. Most recently, my research has been dedicated to respondent driven sampling that uses existing social networks to recruit participants in both face-to-face and Web data collection settings. I plan to expand my research scope in examining representation issues focusing on the racial/ethnic minority groups in the U.S. in the era of big data.