Yixin Wang works in the fields of Bayesian statistics, machine learning, and causal inference, with applications to recommender systems, text data, and genetics. She also works on algorithmic fairness and reinforcement learning, often via connections to causality. Her research centers around developing practical and trustworthy machine learning algorithms for large datasets that can enhance scientific understandings and inform daily decision-making. Her research interests lie in the intersection of theory and applications.
My research focuses on building infrastructure for public health and health science research organizations to take advantage of cloud computing, strong software engineering practices, and MLOps (machine learning operations). By equipping biomedical research groups with tools that facilitate automation, better documentation, and portable code, we can improve the reproducibility and rigor of science while scaling up the kind of data collection and analysis possible.
Research topics include:
1. Open source software and cloud infrastructure for research,
2. Software development practices and conventions that work for academic units, like labs or research centers, and
3. The organizational factors that encourage best practices in reproducibility, data management, and transparency
The practice of science is a tug of war between competing incentives: the drive to do a lot fast, and the need to generate reproducible work. As data grows in size, code increases in complexity and the number of collaborators and institutions involved goes up, it becomes harder to preserve all the “artifacts” needed to understand and recreate your own work. Technical AND cultural solutions will be needed to keep data-centric research rigorous, shareable, and transparent to the broader scientific community.
My research concentrates on the area of bioinformatics, proteomics, and data integration. I am particularly interested in mass spectrometry-based proteomics, software development for proteomics, cancer proteogenomics, and transcriptomics. The computational methods and tools previously developed by my colleagues and me, such as PepExplorer, MSFragger, Philosopher, and PatternLab for Proteomics, are among the most referred proteome informatics tools and are used by hundreds of laboratories worldwide.
I am also a Proteogenomics Data Analysis Center (UM-PGDAC) member as part of the NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) initiative for processing and analyzing hundreds of cancer proteomics samples. UM-PGDAC develops advanced computational infrastructure for comprehensive and global characterization of genomics, transcriptomics, and proteomics data collected from several human tumor cohorts using NCI-provided biospecimens. Since 2019 I have been working as a bioinformatics data analyst with the University of Michigan Proteomics Resource Facility, which provides state-of-the-art capabilities in proteomics to the University of Michigan investigators, including Rogel Cancer Center investigators as Proteomics Shared Resource.
I build data science tools to address challenges in medicine and clinical care. Specifically, I apply signal processing, image processing and machine learning techniques, including deep convolutional and recurrent neural networks and natural language processing, to aid diagnosis, prognosis and treatment of patients with acute and chronic conditions. In addition, I conduct research on novel approaches to represent clinical data and combine supervised and unsupervised methods to improve model performance and reduce the labeling burden. Another active area of my research is design, implementation and utilization of novel wearable devices for non-invasive patient monitoring in hospital and at home. This includes integration of the information that is measured by wearables with the data available in the electronic health records, including medical codes, waveforms and images, among others. Another area of my research involves linear, non-linear and discrete optimization and queuing theory to build new solutions for healthcare logistic planning, including stochastic approximation methods to model complex systems such as dispatch policies for emergency systems with multi-server dispatches, variable server load, multiple priority levels, etc.
My research primarily focuses on the following main themes: 1) development of methods for risk prediction and analyzing treatment effect heterogeneity, 2) Bayesian nonparametrics and Bayesian machine learning methods with a particular emphasis on the use of these methods in the context of survival analysis, 3) statistical methods for analyzing heterogeneity in risk-benefit profiles and for supporting individualized treatment decisions, and 4) development of empirical Bayes and shrinkage methods for high-dimensional statistical applications. I am also broadly interested in collaborative work in biomedical research with a focus on the application of statistics in cancer research.
My research interest lies in applying data science for actionable transformation of human health from the bench to bedside. Current research focus areas include cutting edge single-cell sequencing informatics and genomics; precision medicine through integration of multi-omics data types; novel modeling and computational methods for biomarker research; public health genomics. I apply my biomedical informatics and analytical expertise to study diseases such as cancers, as well the impact of pregnancy/early life complications on later life diseases.
Biodiversity in nature can be puzzlingly high in the light of competition between species, which arguably should eventually result in a single winner. The coexistence mechanisms that allow for this biodiversity shape the dynamics of communities and ecosystems. My research focuses on understanding the mechanisms of competitive coexistence, how competition influences community structure and diversity, and what insights observed patterns of community structure might provide about competitive coexistence.
I am interested in the use and development of data science approaches to draw insights regarding coexistence mechanisms from the structural patterns of ecological communities with respect to species’ functional traits, relative abundance, spatial distribution, and phylogenetic relatedness, through as community dynamics proceed. I am also interested in the use of Maximum Likelihood and Bayesian approaches for fitting demographic models to forest census data sets, demographic models that can then be used to quantitatively assess the role of different competitive coexistence mechanisms.
The current goal of our research is to learn enough about the physiology and ecology of microbes and microbial communities in the gut that we are able to engineer the gut microbiome to improve human health. The first target of our engineering is the production of butyrate – a common fermentation product of some gut microbes that is essential for human health. Butyrate is the preferred energy source for mitochondria in the epithelial cells lining the gut and it also regulates their gene expression.
One of the most effective ways to influence the composition and metabolism of the gut microbiota is through diet. In an interventional study, we have tracked responses in the composition and fermentative metabolism of the gut microtiota in >800 healthy individuals. Emerging patterns suggest several configurations of the microbiome that can result in increased production of butyrate acid. We have isolated the microbes that form an anaerobic food web to convert dietary fiber to butyrate and continue to make discoveries about their physiology and interactions. Based on these results, we have initiated a clinical trial in which we are hoping to prevent the development of Graft versus Host Disease following bone marrow transplants by managing butyrate production by the gut microbiota.
We are also beginning to track hundreds of other metabolites from the gut microbiome that may influence human health. We use metagenomes and metabolomes to identify patterns that link the microbiota with their metabolites and then test those models in human organoids and gnotobiotic mice colonized with synthetic communities of microbes. This blend of wet-lab research in basic microbiology, data science and in ecology is moving us closer to engineering the gut microbiome to improve human health.
My research is focused on the human biometric data (such as motion) to guide the design and manufacturing of assistive and proactive devices. Embedded and external sensors generate ample data which require scientific approaches to analyze and create knowledge. I have worked closely with the University of Michigan Orthotics and Prosthetics Center in the design and manufacturing of custom assistive devices using 3D-printing and cyber-based design. The goal is to create a cyber-physical system that can acquire the data from scanning, sensors, human motion, user feedback, clinician diagnosis into quantitative health metrics and guidelines to improve the quality of care for people with needs.
My lab has two main areas of focus: molecular characteristics of head and neck cancer, and the intersection of regulatory genomics and pathway analysis. With head and neck cancer, we study tumor subtypes and biomarkers of prognosis, treatment response, and recurrence. We perform integrative omics analyses, dimension reduction methods, and prediction techniques, with the ultimate goal of identifying patient subsets who would benefit from either an additional targeted treatment or de-escalated treatment to increase quality of life. For regulatory genomics and pathway analysis, we develop statistical tests taking into account important covariates and other variables for weighting observations.