Explore ARCExplore ARC

Ginger Shultz

By |

The Shultz group uses data science methods in two primary ways 1) to investigate student placement in introductory chemistry courses and 2) to analyze student texts to provide instructors actionable intelligence about student learning. Using regression discontinuity we investigated the impact of taking general chemistry prior to organic chemistry on student performance and persistence in later chemistry courses and found that students who took general chemistry first benefitted by 1/4 of a letter grade but were no more likely to persist. A continued investigation using survey and interview methods indicated that this was related to academic skills rather than content preparation. Through the MWrite project we have collected a large corpus of student texts and are developing automated text analysis methods to glean information about student learning across disciplines, with specific focus on scientific reasoning.

Network representation of writing moves made by students in argumentative writing with relevant transition probabilities. The size of the node represents the relative frequency of operation use and the edge labels represent the transition probability with key transitions highlighted in orange.

Hyun Min Kang

By |

Hyun Min Kang is an Associate Professor in the Department of Biostatistics. He received his Ph.D. in Computer Science from University of California, San Diego in 2009 and joined the University of Michigan faculty in the same year. Prior to his doctoral studies, he worked as a research fellow at the Genome Research Center for Diabetes and Endocrine Disease in the Seoul National University Hospital for a year and a half, after completing his Bachelors and Masters degree in Electrical Engineering at Seoul National University. His research interest lies in big data genome science. Methodologically, his primary focus is on developing statistical methods and computational tools for large-scale genetic studies. Scientifically, his research aims to understand the etiology of complex disease traits, including type 2 diabetes, bipolar disorder, cardiovascular diseases, and glomerular diseases.

Veera Baladandayuthapani

By |

Dr. Veera Baladandayuthapani is currently a Professor in the Department of Biostatistics at University of Michigan (UM), where he is also the Associate Director of the Center for Cancer Biostatistics. He joined UM in Fall 2018 after spending 13 years in the Department of Biostatistics at University of Texas MD Anderson Cancer Center, Houston, Texas, where was a Professor and Institute Faculty Scholar and held adjunct appointments at Rice University, Texas A&M University and UT School of Public Health. His research interests are mainly in high-dimensional data modeling and Bayesian inference. This includes functional data analyses, Bayesian graphical models, Bayesian semi-/non-parametric models and Bayesian machine learning. These methods are motivated by large and complex datasets (a.k.a. Big Data) such as high-throughput genomics, epigenomics, transcriptomics and proteomics as well as high-resolution neuro- and cancer- imaging. His work has been published in top statistical/biostatistical/bioinformatics and biomedical/oncology journals. He has also co-authored a book on Bayesian analysis of gene expression data. He currently holds multiple PI-level grants from NIH and NSF to develop innovative and advanced biostatistical and bioinformatics methods for big datasets in oncology. He has also served as the Director of the Biostatistics and Bioinformatics Cores for the Specialized Programs of Research Excellence (SPOREs) in Multiple Myeloma and Lung Cancer and Biostatistics&Bioinformatics platform leader for the Myeloma and Melanoma Moonshot Programs at MD Anderson. He is a fellow of the American Statistical Association and an elected member of the International Statistical Institute. He currently serves as an Associate Editor for Journal of American Statistical Association, Biometrics and Sankhya.


An example of horizontal (across cancers) and vertical (across multiple molecular platforms) data integration. Image from Ha et al (Nature Scientific Reports, 2018; https://www.nature.com/articles/s41598-018-32682-x)

Shan Bao

By |

My research interests are to improve safety associated with motor-vehicle transportation by addressing both active safety (increased crash avoidance) and passive safety (increased crash protection) issues through the development and application of a wide range of research methodologies. These methodologies are targeted at developing a better understanding and modeling of driver behavior, including physical and cognitive attributes, driver decision-making processes and human intention prediction. I am currently interested in applying data science to study the following topics:
*Driver state detection and prediction;

*Improve user intersection with automated vehicle technologies;

*Communication and interaction between vehicle and vulnerable road users

*Driving style classification

*Human factors issues associated with connected and automated vehicle technologies

Where do drivers look when they are not paying attention to the road


Mousumi Banerjee

By |

My research is primarily focused around 1) machine learning methods for understanding healthcare delivery and outcomes in the population, 2) analyses of correlated data (e.g. longitudinal and clustered data), and 3) survival analysis and competing risks analyses. We have developed tree-based and ensemble regression methods for censored and multilevel data, combination classifiers using different types of learning methods, and methodology to identify representative trees from an ensemble. These methods have been applied to important areas of biomedicine, specifically in patient prognostication, in developing clinical decision-making tools, and in identifying complex interactions between patient, provider, and health systems for understanding variations in healthcare utilization and delivery. My substantive areas of research are cancer and pediatric cardiovascular disease.

Victoria Morckel

By |

Dr. Morckel uses spatial and statistical methods to examine ways to improve quality of life for people living in shrinking, deindustrialized cities in the Midwestern United States. She is especially interested in the causes and consequences of population loss, including issues of vacancy, blight, and neighborhood change.

Suitability Analysis Results: Map of Potential Properties to Naturalize in the City of Flint, Michigan.

Raed Al Kontar

By |

My research broadly focuses on developing data analytics and decision-making methodologies specifically tailored for Internet of Things (IoT) enabled smart and connected products/systems. I envision that most (if not all) engineering systems will eventually become connected systems in the future. Therefore, my key focus is on developing next-generation data analytics, machine learning, individualized informatics and graphical and network modeling tools to truly realize the competitive advantages that are promised by smart and connected products/systems.


Ho-Joon Lee

By |

Dr. Lee’s research in data science concerns biological questions in systems biology and network medicine by developing algorithms and models through a combination of statistical/machine learning, information theory, and network theory applied to multi-dimensional large-scale data. His projects have covered genomics, transcriptomics, proteomics, and metabolomics from yeast to mouse to human for integrative analysis of regulatory networks on multiple molecular levels, which also incorporates large-scale public databases such as GO for functional annotation, PDB for molecular structures, and PubChem and LINCS for drugs or small compounds. He previously carried out proteomics and metabolomics along with a computational derivation of dynamic protein complexes for IL-3 activation and cell cycle in murine pro-B cells (Lee et al., Cell Reports 2017), for which he developed integrative analytical tools using diverse approaches from machine learning and network theory. His ongoing interests in methodology include machine/deep learning and topological Kolmogorov-Sinai entropy-based network theory, which are applied to (1) multi-level dynamic regulatory networks in immune response, cell cycle, and cancer metabolism and (2) mass spectrometry-based omics data analysis.

Figure 1. Proteomics and metabolomics analysis of IL-3 activation and cell cycle (Lee et al., Cell Reports 2017). (A) Multi-omics abundance profiles of proteins, modules/complexes, intracellular metabolites, and extracellular metabolites over one cell cycle (from left to right columns) in response to IL-3 activation. Red for proteins/modules/intracellular metabolites up-regulation or extracellular metabolites release; Green for proteins/modules/intracellular metabolites down-regulation or extracellular metabolites uptake. (B) Functional module network identified from integrative analysis. Red nodes are proteins and white nodes are functional modules. Expression profile plots are shown for literature-validated functional modules. (C) Overall pathway map of IL-3 activation and cell cycle phenotypes. (D) IL-3 activation and cell cycle as a cancer model along with candidate protein and metabolite biomarkers. (E) Protein co-expression scale-free network. (F) Power-low degree distribution of the network E. (G) Protein entropy distribution by topological Kolmogorov-Sinai entropy calculated for the network E.


Samuel K Handelman

By |

Samuel K Handelman, Ph.D., is Research Assistant Professor in the department of Internal Medicine, Gastroenterology, of Michigan Medicine at the University of Michigan, Ann Arbor. Prof. Handelman is focused on multi-omics approaches to drive precision/personalized-therapy and to predict population-level differences in the effectiveness of interventions. He tends to favor regression-style and hierarchical-clustering approaches, partially because he has a background in both statistics and in cladistics. His scientific monomania is for compensatory mechanisms and trade-offs in evolution, but he has a principled reason to focus on translational medicine: real understanding of these mechanisms goes all the way into the clinic. Anything less that clinical translation indicates that we don’t understand what drove the genetics of human populations.

Brian P. McCall

By |

My interests are in the areas of labor economics, program evaluation, and the economics of education. Currently my research focuses on college student debt accumulation and the subsequent risk of default, the effect of tuition subsidies on college attendance, the influence of family wealth on college attendance and completion, the effect of financial aid packages on college attendance, completion and subsequent labor market earnings, the influence of education on job displacement and subsequent earnings, the impact of unemployment insurance rules on unemployment durations and re-employment wages, and the determinants and consequences of repeat use of the unemployment insurance system.