Explore ARCExplore ARC

Ginger Shultz

By |

The Shultz group uses data science methods in two primary ways 1) to investigate student placement in introductory chemistry courses and 2) to analyze student texts to provide instructors actionable intelligence about student learning. Using regression discontinuity we investigated the impact of taking general chemistry prior to organic chemistry on student performance and persistence in later chemistry courses and found that students who took general chemistry first benefitted by 1/4 of a letter grade but were no more likely to persist. A continued investigation using survey and interview methods indicated that this was related to academic skills rather than content preparation. Through the MWrite project we have collected a large corpus of student texts and are developing automated text analysis methods to glean information about student learning across disciplines, with specific focus on scientific reasoning.

Network representation of writing moves made by students in argumentative writing with relevant transition probabilities. The size of the node represents the relative frequency of operation use and the edge labels represent the transition probability with key transitions highlighted in orange.

Veera Baladandayuthapani

By |

Dr. Veera Baladandayuthapani is currently a Professor in the Department of Biostatistics at University of Michigan (UM), where he is also the Associate Director of the Center for Cancer Biostatistics. He joined UM in Fall 2018 after spending 13 years in the Department of Biostatistics at University of Texas MD Anderson Cancer Center, Houston, Texas, where was a Professor and Institute Faculty Scholar and held adjunct appointments at Rice University, Texas A&M University and UT School of Public Health. His research interests are mainly in high-dimensional data modeling and Bayesian inference. This includes functional data analyses, Bayesian graphical models, Bayesian semi-/non-parametric models and Bayesian machine learning. These methods are motivated by large and complex datasets (a.k.a. Big Data) such as high-throughput genomics, epigenomics, transcriptomics and proteomics as well as high-resolution neuro- and cancer- imaging. His work has been published in top statistical/biostatistical/bioinformatics and biomedical/oncology journals. He has also co-authored a book on Bayesian analysis of gene expression data. He currently holds multiple PI-level grants from NIH and NSF to develop innovative and advanced biostatistical and bioinformatics methods for big datasets in oncology. He has also served as the Director of the Biostatistics and Bioinformatics Cores for the Specialized Programs of Research Excellence (SPOREs) in Multiple Myeloma and Lung Cancer and Biostatistics&Bioinformatics platform leader for the Myeloma and Melanoma Moonshot Programs at MD Anderson. He is a fellow of the American Statistical Association and an elected member of the International Statistical Institute. He currently serves as an Associate Editor for Journal of American Statistical Association, Biometrics and Sankhya.


An example of horizontal (across cancers) and vertical (across multiple molecular platforms) data integration. Image from Ha et al (Nature Scientific Reports, 2018; https://www.nature.com/articles/s41598-018-32682-x)

Oleg Gnedin

By |

I am a theoretical astrophysicist studying the origins and structure of galaxies in the universe. My research focuses on developing more realistic gasdynamics simulations, starting with the initial conditions that are well constrained by observations, and advancing them in time with high spatial resolution using adaptive mesh refinement. I use machine-learning techniques to compare simulation predictions with observational data. Such comparison leads to insights about the underlying physics that governs the formation of stars and galaxies. I have developed a Computational Astrophysics course that teaches practical application of modern techniques for big-data analysis and model fitting.

Emergence of galaxies and star clusters in cosmological gasdynamics simulations. Left panel shows large-scale cosmic structure (density of dark matter particles), which formed by gravitational instability. In the middle panel we can resolve this structure into disk galaxies with complex morphology (density of molecular/red and atomic/blue gas). These galaxies should create massive star clusters, such as shown in the right panel (real image — to be reproduced by our future simulations!).

Xiang Zhou

By |

My research is focused on developing efficient and effective statistical and computational methods for genetic and genomic studies. These studies often involve large-scale and high-dimensional data; examples include genome-wide association studies, epigenome-wide association studies, and various functional genomic sequencing studies such as bulk and single cell RNAseq, bisulfite sequencing, ChIPseq, ATACseq etc. Our method development is often application oriented and specifically targeted for practical applications of these large-scale genetic and genomic studies, thus is not restricted in a particular methodology area. Our previous and current methods include, but are not limited to, Bayesian methods, mixed effects models, factor analysis models, sparse regression models, deep learning algorithms, clustering algorithms, integrative methods, spatial statistics, and efficient computational algorithms. By developing novel analytic methods, I seek to extract important information from these data and to advance our understanding of the genetic basis of phenotypic variation for various human diseases and disease related quantitative traits.

A statistical method recently developed in our group aims to identify tissues that are relevant to diseases or disease related complex traits, through integrating tissue specific omics studies (e.g. ROADMAP project) with genome-wide association studies (GWASs). Heatmap displays the rank of 105 tissues (y-axis) in terms of their relevance for each of the 43 GWAS traits (x-axis) evaluated by our method. Traits are organized by hierarchical clustering. Tissues are organized into ten tissue groups.

Shan Bao

By |

My research interests are to improve safety associated with motor-vehicle transportation by addressing both active safety (increased crash avoidance) and passive safety (increased crash protection) issues through the development and application of a wide range of research methodologies. These methodologies are targeted at developing a better understanding and modeling of driver behavior, including physical and cognitive attributes, driver decision-making processes and human intention prediction. I am currently interested in applying data science to study the following topics:
*Driver state detection and prediction;

*Improve user intersection with automated vehicle technologies;

*Communication and interaction between vehicle and vulnerable road users

*Driving style classification

*Human factors issues associated with connected and automated vehicle technologies

Where do drivers look when they are not paying attention to the road


Patrick Schloss

By |

The Schloss lab is broadly interested in beneficial and pathogenic host-microbiome interactions with the goal of improving our understanding of how the microbiome can be used to reach translational outcomes in the prevention, detection, and treatment of colorectal cancer, Crohn’s disease, and Clostridium difficile infection. To address these questions, we test traditional ecological theory in the microbial context using a systems biology approach. Specifically, the laboratory specializes in using studies involving human subjects and animal models to understand how biological diversity affects community function using a variety of culture-independent genomics techniques including sequencing 16S rRNA gene fragments, metagenomics, and metatranscriptomics. In addition, they use metabolomics to understand the functional role of the gut microbiota in states of health and disease. To support these efforts, they develop and apply bioinformatic tools to facilitate their analysis. Most notable is the development of the mothur software package (https://www.mothur.org), which is one of the most widely used tools for analyzing microbiome data and has been cited more than 7,300 times since it was initially published in 2009. The Schloss lab deftly merges the ability to collect data to answer important biological questions using cutting edge wet-lab techniques and computational tools to synthesize these data to answer their biological questions.

Given the explosion in microbiome research over the past 15 years, the Schloss lab has also stood at the center of a major effort to train interdisciplinary scientists in applying computational tools to study complex biological systems. These efforts have centered around developing reproducible research skills and applying modern data visualization techniques. An outgrowth of these efforts at the University of Michigan has been the institutionalization of The Carpentries organization on campus (https://carpentries.org), which specializes in peer-to-peer instruction of programming tools and techniques to foster better reproducibility and build a community of practitioners.

The Schloss lab uses computational tools to integrate multi-omics tools in a culture-independent approach to understand how bacteria interact with each other and their host to drive processes such as colorectal cancer and susceptibility to Clostridium difficile infections.

Mousumi Banerjee

By |

My research is primarily focused around 1) machine learning methods for understanding healthcare delivery and outcomes in the population, 2) analyses of correlated data (e.g. longitudinal and clustered data), and 3) survival analysis and competing risks analyses. We have developed tree-based and ensemble regression methods for censored and multilevel data, combination classifiers using different types of learning methods, and methodology to identify representative trees from an ensemble. These methods have been applied to important areas of biomedicine, specifically in patient prognostication, in developing clinical decision-making tools, and in identifying complex interactions between patient, provider, and health systems for understanding variations in healthcare utilization and delivery. My substantive areas of research are cancer and pediatric cardiovascular disease.

Victoria Morckel

By |

Dr. Morckel uses spatial and statistical methods to examine ways to improve quality of life for people living in shrinking, deindustrialized cities in the Midwestern United States. She is especially interested in the causes and consequences of population loss, including issues of vacancy, blight, and neighborhood change.

Suitability Analysis Results: Map of Potential Properties to Naturalize in the City of Flint, Michigan.

Raed Al Kontar

By |

My research broadly focuses on developing data analytics and decision-making methodologies specifically tailored for Internet of Things (IoT) enabled smart and connected products/systems. I envision that most (if not all) engineering systems will eventually become connected systems in the future. Therefore, my key focus is on developing next-generation data analytics, machine learning, individualized informatics and graphical and network modeling tools to truly realize the competitive advantages that are promised by smart and connected products/systems.


Ho-Joon Lee

By |

Dr. Lee’s research in data science concerns biological questions in systems biology and network medicine by developing algorithms and models through a combination of statistical/machine learning, information theory, and network theory applied to multi-dimensional large-scale data. His projects have covered genomics, transcriptomics, proteomics, and metabolomics from yeast to mouse to human for integrative analysis of regulatory networks on multiple molecular levels, which also incorporates large-scale public databases such as GO for functional annotation, PDB for molecular structures, and PubChem and LINCS for drugs or small compounds. He previously carried out proteomics and metabolomics along with a computational derivation of dynamic protein complexes for IL-3 activation and cell cycle in murine pro-B cells (Lee et al., Cell Reports 2017), for which he developed integrative analytical tools using diverse approaches from machine learning and network theory. His ongoing interests in methodology include machine/deep learning and topological Kolmogorov-Sinai entropy-based network theory, which are applied to (1) multi-level dynamic regulatory networks in immune response, cell cycle, and cancer metabolism and (2) mass spectrometry-based omics data analysis.

Figure 1. Proteomics and metabolomics analysis of IL-3 activation and cell cycle (Lee et al., Cell Reports 2017). (A) Multi-omics abundance profiles of proteins, modules/complexes, intracellular metabolites, and extracellular metabolites over one cell cycle (from left to right columns) in response to IL-3 activation. Red for proteins/modules/intracellular metabolites up-regulation or extracellular metabolites release; Green for proteins/modules/intracellular metabolites down-regulation or extracellular metabolites uptake. (B) Functional module network identified from integrative analysis. Red nodes are proteins and white nodes are functional modules. Expression profile plots are shown for literature-validated functional modules. (C) Overall pathway map of IL-3 activation and cell cycle phenotypes. (D) IL-3 activation and cell cycle as a cancer model along with candidate protein and metabolite biomarkers. (E) Protein co-expression scale-free network. (F) Power-low degree distribution of the network E. (G) Protein entropy distribution by topological Kolmogorov-Sinai entropy calculated for the network E.