Dr. VanEseltine is a sociologist and data scientist working with large-scale administrative data for causal and policy analysis. His interests include studying the effects of scientific infrastructure, training, and initiatives, as well as the development of open, sustainable, and replicable systems for data construction, curation, and dissemination. As part of the Institute for Research on Innovation and Science (IRIS), he contributes to record linkage and data improvements in the research community releases of UMETRICS, a data system built from integrated records on federal award funding and spending from dozens of American universities. Dr. VanEseltine’s recent work includes studying the impacts of COVID-19 on academic research activity.
My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.
Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency
The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.
I am a Research Fellow in the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. My research is currently supported by a NSF project, Developing Evidence-based Data Sharing and Archiving Policies, where I am analyzing curation activities, automatically detecting data citations, and contributing to metrics for tracking the impact of data reuse. I hold a Ph.D. in Geography from UC Santa Barbara and I have expertise in GIScience, spatial information science, and urban planning. My interests also include the Semantic Web, innovative GIS education, and the science of science. I have experience deploying geospatial applications, designing linked data models, and developing visualizations to support data discovery.
My research interest lies in applying data science for actionable transformation of human health from the bench to bedside. Current research focus areas include cutting edge single-cell sequencing informatics and genomics; precision medicine through integration of multi-omics data types; novel modeling and computational methods for biomarker research; public health genomics. I apply my biomedical informatics and analytical expertise to study diseases such as cancers, as well the impact of pregnancy/early life complications on later life diseases.
My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.
My work falls into three general application areas. I am an applied (accredited) biostatistician with a strong team science motivation and I collaborate with scientists in primarily the biomedical sciences, contributing expertise in experimental design, statistical analysis/modeling, and data visualization. I have held faculty appointments in Schools of Medicine and Nursing, and also worked as a senior scientist in the Human Research Program at the NASA Johnson Space Center. I currently direct an Applied Biostatistics Laboratory and Data Management Core within the UM School of Nursing, and maintain several collaborative research programs within the School, at NASA, and with collaborators elsewhere.
My research focuses on issues in data collection with hard-to-reach populations. In particular, she examines 1) nontraditional sampling approaches for minority or stigmatized populations and their statistical properties and 2) measurement error and comparability issues for racial, ethnic and linguistic minorities, which also have implications for cross-cultural research/survey methodology. Most recently, my research has been dedicated to respondent driven sampling that uses existing social networks to recruit participants in both face-to-face and Web data collection settings. I plan to expand my research scope in examining representation issues focusing on the racial/ethnic minority groups in the U.S. in the era of big data.
My research involves developing novel data collection strategies and image reconstruction techniques for Magnetic Resonance Imaging. In order to accelerate data collection, we take advantage of features of MRI data, including sparsity, spatiotemporal correlations, and adherence to underlying physics; each of these properties can be leveraged to reduce the amount of data required to generate an image and thus speed up imaging time. We also seek to understand what image information is essential for radiologists in order to optimize MRI data collection and personalize the imaging protocol for each patient. We deploy machine learning algorithms and optimization techniques in each of these projects. In some of our work, we can generate the data that we need to train and test our algorithms using numerical simulations. In other portions, we seek to utilize clinical images, prospectively collected MRI data, or MRI protocol information in order to refine our techniques.
We seek to develop technologies like cardiac Magnetic Resonance Fingerprinting (cMRF), which can be used to efficiently collect multiple forms of information to distinguish healthy and diseased tissue using MRI. By using rapid methods like cMRF, quantitative data describing disease processes can be gathered quickly, enabling more and sicker patients can be assessed via MRI. These data, collected from many patients over time, can also be used to further refine MRI technologies for the assessment of specific diseases in a tailored, patient-specific manner.
The long temporal and large spatial scales of ecological systems make controlled experimentation difficult and the amassing of informative data challenging and expensive. The resulting sparsity and noise are major impediments to scientific progress in ecology, which therefore depends on efficient use of data. In this context, it has in recent years been recognized that the onetime playthings of theoretical ecologists, mathematical models of ecological processes, are no longer exclusively the stuff of thought experiments, but have great utility in the context of causal inference. Specifically, because they embody scientific questions about ecological processes in sharpest form—making precise, quantitative, testable predictions—the rigorous confrontation of process-based models with data accelerates the development of ecological understanding. This is the central premise of my research program and the common thread of the work that goes on in my laboratory.
The Schloss lab is broadly interested in beneficial and pathogenic host-microbiome interactions with the goal of improving our understanding of how the microbiome can be used to reach translational outcomes in the prevention, detection, and treatment of colorectal cancer, Crohn’s disease, and Clostridium difficile infection. To address these questions, we test traditional ecological theory in the microbial context using a systems biology approach. Specifically, the laboratory specializes in using studies involving human subjects and animal models to understand how biological diversity affects community function using a variety of culture-independent genomics techniques including sequencing 16S rRNA gene fragments, metagenomics, and metatranscriptomics. In addition, they use metabolomics to understand the functional role of the gut microbiota in states of health and disease. To support these efforts, they develop and apply bioinformatic tools to facilitate their analysis. Most notable is the development of the mothur software package (https://www.mothur.org), which is one of the most widely used tools for analyzing microbiome data and has been cited more than 7,300 times since it was initially published in 2009. The Schloss lab deftly merges the ability to collect data to answer important biological questions using cutting edge wet-lab techniques and computational tools to synthesize these data to answer their biological questions.
Given the explosion in microbiome research over the past 15 years, the Schloss lab has also stood at the center of a major effort to train interdisciplinary scientists in applying computational tools to study complex biological systems. These efforts have centered around developing reproducible research skills and applying modern data visualization techniques. An outgrowth of these efforts at the University of Michigan has been the institutionalization of The Carpentries organization on campus (https://carpentries.org), which specializes in peer-to-peer instruction of programming tools and techniques to foster better reproducibility and build a community of practitioners.