I am a Research Fellow in the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. My research is currently supported by a NSF project, Developing Evidence-based Data Sharing and Archiving Policies, where I am analyzing curation activities, automatically detecting data citations, and contributing to metrics for tracking the impact of data reuse. I hold a Ph.D. in Geography from UC Santa Barbara and I have expertise in GIScience, spatial information science, and urban planning. My interests also include the Semantic Web, innovative GIS education, and the science of science. I have experience deploying geospatial applications, designing linked data models, and developing visualizations to support data discovery.
My research interest lies in applying data science for actionable transformation of human health from the bench to bedside. Current research focus areas include cutting edge single-cell sequencing informatics and genomics; precision medicine through integration of multi-omics data types; novel modeling and computational methods for biomarker research; public health genomics. I apply my biomedical informatics and analytical expertise to study diseases such as cancers, as well the impact of pregnancy/early life complications on later life diseases.
My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.
My work falls into three general application areas. I am an applied (accredited) biostatistician with a strong team science motivation and I collaborate with scientists in primarily the biomedical sciences, contributing expertise in experimental design, statistical analysis/modeling, and data visualization. I have held faculty appointments in Schools of Medicine and Nursing, and also worked as a senior scientist in the Human Research Program at the NASA Johnson Space Center. I currently direct an Applied Biostatistics Laboratory and Data Management Core within the UM School of Nursing, and maintain several collaborative research programs within the School, at NASA, and with collaborators elsewhere.
My research focuses on issues in data collection with hard-to-reach populations. In particular, she examines 1) nontraditional sampling approaches for minority or stigmatized populations and their statistical properties and 2) measurement error and comparability issues for racial, ethnic and linguistic minorities, which also have implications for cross-cultural research/survey methodology. Most recently, my research has been dedicated to respondent driven sampling that uses existing social networks to recruit participants in both face-to-face and Web data collection settings. I plan to expand my research scope in examining representation issues focusing on the racial/ethnic minority groups in the U.S. in the era of big data.
My research involves developing novel data collection strategies and image reconstruction techniques for Magnetic Resonance Imaging. In order to accelerate data collection, we take advantage of features of MRI data, including sparsity, spatiotemporal correlations, and adherence to underlying physics; each of these properties can be leveraged to reduce the amount of data required to generate an image and thus speed up imaging time. We also seek to understand what image information is essential for radiologists in order to optimize MRI data collection and personalize the imaging protocol for each patient. We deploy machine learning algorithms and optimization techniques in each of these projects. In some of our work, we can generate the data that we need to train and test our algorithms using numerical simulations. In other portions, we seek to utilize clinical images, prospectively collected MRI data, or MRI protocol information in order to refine our techniques.
We seek to develop technologies like cardiac Magnetic Resonance Fingerprinting (cMRF), which can be used to efficiently collect multiple forms of information to distinguish healthy and diseased tissue using MRI. By using rapid methods like cMRF, quantitative data describing disease processes can be gathered quickly, enabling more and sicker patients can be assessed via MRI. These data, collected from many patients over time, can also be used to further refine MRI technologies for the assessment of specific diseases in a tailored, patient-specific manner.
The long temporal and large spatial scales of ecological systems make controlled experimentation difficult and the amassing of informative data challenging and expensive. The resulting sparsity and noise are major impediments to scientific progress in ecology, which therefore depends on efficient use of data. In this context, it has in recent years been recognized that the onetime playthings of theoretical ecologists, mathematical models of ecological processes, are no longer exclusively the stuff of thought experiments, but have great utility in the context of causal inference. Specifically, because they embody scientific questions about ecological processes in sharpest form—making precise, quantitative, testable predictions—the rigorous confrontation of process-based models with data accelerates the development of ecological understanding. This is the central premise of my research program and the common thread of the work that goes on in my laboratory.
The Schloss lab is broadly interested in beneficial and pathogenic host-microbiome interactions with the goal of improving our understanding of how the microbiome can be used to reach translational outcomes in the prevention, detection, and treatment of colorectal cancer, Crohn’s disease, and Clostridium difficile infection. To address these questions, we test traditional ecological theory in the microbial context using a systems biology approach. Specifically, the laboratory specializes in using studies involving human subjects and animal models to understand how biological diversity affects community function using a variety of culture-independent genomics techniques including sequencing 16S rRNA gene fragments, metagenomics, and metatranscriptomics. In addition, they use metabolomics to understand the functional role of the gut microbiota in states of health and disease. To support these efforts, they develop and apply bioinformatic tools to facilitate their analysis. Most notable is the development of the mothur software package (https://www.mothur.org), which is one of the most widely used tools for analyzing microbiome data and has been cited more than 7,300 times since it was initially published in 2009. The Schloss lab deftly merges the ability to collect data to answer important biological questions using cutting edge wet-lab techniques and computational tools to synthesize these data to answer their biological questions.
Given the explosion in microbiome research over the past 15 years, the Schloss lab has also stood at the center of a major effort to train interdisciplinary scientists in applying computational tools to study complex biological systems. These efforts have centered around developing reproducible research skills and applying modern data visualization techniques. An outgrowth of these efforts at the University of Michigan has been the institutionalization of The Carpentries organization on campus (https://carpentries.org), which specializes in peer-to-peer instruction of programming tools and techniques to foster better reproducibility and build a community of practitioners.
Dr. Lee’s research in data science concerns biological questions in systems biology and network medicine by developing algorithms and models through a combination of statistical/machine learning, information theory, and network theory applied to multi-dimensional large-scale data. His projects have covered genomics, transcriptomics, proteomics, and metabolomics from yeast to mouse to human for integrative analysis of regulatory networks on multiple molecular levels, which also incorporates large-scale public databases such as GO for functional annotation, PDB for molecular structures, and PubChem and LINCS for drugs or small compounds. He previously carried out proteomics and metabolomics along with a computational derivation of dynamic protein complexes for IL-3 activation and cell cycle in murine pro-B cells (Lee et al., Cell Reports 2017), for which he developed integrative analytical tools using diverse approaches from machine learning and network theory. His ongoing interests in methodology include machine/deep learning and topological Kolmogorov-Sinai entropy-based network theory, which are applied to (1) multi-level dynamic regulatory networks in immune response, cell cycle, and cancer metabolism and (2) mass spectrometry-based omics data analysis.
Matthew Kay, PhD, is Assistant Professor of Information, School of Information and Assistant Professor of Electrical Engineering and Computer Science, College of Engineering, at the University of Michigan, Ann Arbor.
Prof. Kay’s research includes work on communicating uncertainty, usable statistics, and personal informatics. People are increasingly exposed to sensing and prediction in their daily lives (“how many steps did I take today?”, “how long until my bus shows up?”, “how much do I weigh?”). Uncertainty is both inherent to these systems and usually poorly communicated. To build understandable data presentations, we must study how people interpret their data and what goals they have for it, which informs the way that we should communicate results from our models, which in turn determines what models we must use in the first place. Prof. Kay tackles these problems using a multi-faceted approach, including qualitative and quantitative analysis of behavior, building and evaluating interactive systems, and designing and testing visualization techniques. His work draws on approaches from human-computer interaction, information visualization, and statistics to build information visualizations that people can more easily understand along with the models to back those visualizations.