Matthew VanEseltine

By |

Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.

Elle O’Brien

By |

My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.

Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency

The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.

View MIDAS Faculty Research Pitch, Fall 2021

 

Daniel P. Keating

By |

The primary tools currently in use are variations of linear models (regression, MLM, SEM, and so on) as we pursue the initial aims of the NICHD funded work. We are expanding into new areas that require new tools. Our adolescent sample is diverse, selected through quota sampling of high schools close enough to UM to afford the use of neuroimaging tools, but it is not population representative. To overcome this, we have begun work to calibrate our sample with the nationally representative Monitoring the Future study, implementing pseudo-weighting and multilevel regression and post-stratification. To enable much more powerful analyses, we are aiming toward the harmonization of multiple, high quality longitudinal databases from adolescence through early adulthood. This would benefit traditional analyses by allowing cross-validation with high power, but also provide opportunities for newer data science tools such as computational modeling and machine learning approaches.

Ken Kollman

By |

I have been involved in the building of data infrastructure in the study of elections, political systems, violence, geospatial units, demographics, and topography. This infrastructure will eventually lead to the integration of data across many domains in the social, health, population, and behavioral sciences. My core research interests are in elections and political organizations.

Sara Lafia

By |

I am a Research Fellow in the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. My research is currently supported by a NSF project, Developing Evidence-based Data Sharing and Archiving Policies, where I am analyzing curation activities, automatically detecting data citations, and contributing to metrics for tracking the impact of data reuse. I hold a Ph.D. in Geography from UC Santa Barbara and I have expertise in GIScience, spatial information science, and urban planning. My interests also include the Semantic Web, innovative GIS education, and the science of science. I have experience deploying geospatial applications, designing linked data models, and developing visualizations to support data discovery.

J. Trent Alexander

By |

J. Trent Alexander is the Associate Director and a Research Professor at ICPSR in the Institute for Social Research at the University of Michigan. Alexander is a historical demographer and builds social science data infrastructure. He is currently leading the Decennial Census Digitization and Linkage Project (joint with Raj Chetty and Katie Genadek) and ResearchDataGov (joint with Lynette Hoelter). Prior to coming to ICPSR in 2017, Alexander initiated the Census Longitudinal Infrastructure Project at the Census Bureau and managed the Integrated Public Use Microdata Series (IPUMS) at the University of Minnesota.

Ranjan Pal

By |

Cyber-security is a complex and multi-dimensional research field. My research style comprises an inter-disciplinary (primarily rooted in economics, econometrics, data science (AI/ML/Bayesian and Frequentist Statistics), game theory, and network science) investigation of major socially pressing issues impacting the quality of cyber-risk management in modern networked and distributed engineering systems such as IoT-driven critical infrastructures, cloud-based service networks, and app-based systems (e.g., mobile commerce, smart homes) to name a few. I take delight in proposing data-driven, rigorous, and interdisciplinary solutions to both, existing fundamental challenges that pose a practical bottleneck to (cost) effective cyber-risk management, and futuristic cyber-security and privacy issues that might plague modern (networked) engineering systems. I strongly strive for originality, practical significance, and mathematical rigor in my solutions. One of my primary end goals is to conceptually get arms around complex, multi-dimensional information security and privacy problems in a way that helps, informs, and empowers practitioners and policy makers to take the right steps in making the cyber-space more secure.

Albert Shih

By |

My research is focused on the human biometric data (such as motion) to guide the design and manufacturing of assistive and proactive devices. Embedded and external sensors generate ample data which require scientific approaches to analyze and create knowledge. I have worked closely with the University of Michigan Orthotics and Prosthetics Center in the design and manufacturing of custom assistive devices using 3D-printing and cyber-based design. The goal is to create a cyber-physical system that can acquire the data from scanning, sensors, human motion, user feedback, clinician diagnosis into quantitative health metrics and guidelines to improve the quality of care for people with needs.

Maureen Sartor

By |

My lab has two main areas of focus: molecular characteristics of head and neck cancer, and the intersection of regulatory genomics and pathway analysis. With head and neck cancer, we study tumor subtypes and biomarkers of prognosis, treatment response, and recurrence. We perform integrative omics analyses, dimension reduction methods, and prediction techniques, with the ultimate goal of identifying patient subsets who would benefit from either an additional targeted treatment or de-escalated treatment to increase quality of life. For regulatory genomics and pathway analysis, we develop statistical tests taking into account important covariates and other variables for weighting observations.

Xu Shi

By |

My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.