I study patterns in large, complex data sets, and make quantitative predictions and inferences about those patterns. Problems I’ve worked on include classification, anomaly detection, active and semi-supervised learning, transfer learning, and density estimation. I am primarily interested in developing new algorithms and proving performance guarantees for new and existing algorithms.

My research examines how people make choices in uncertain environments. The general focus is on using statistical models to explain complex decision patterns, particularly involving sequential choices among related items (e.g., brands in the same category) and dyads (e.g., people choosing one another in online dating), as well as a variety of applications to problems in the marketing domain (e.g., models relating advertising exposures to awareness and sales). The main methods used lie primarily in discrete choice models, ordinarily estimated using Bayesian methods, dynamic programming, and nonparametrics. I’m particularly interested in extending Bayesian analysis to very large databases, especially in terms of ‘fusing’ data sets with only partly overlapping covariates to enable strong statistical identification of models across them.

Kerby Shedden has broad interests involving applied statistics, data science and computing with data. Through his work directing the data science consulting service he has worked in a wide variety of application domains including numerous areas within health science, social science, and transportation research. A current major focus is development of software tools that exploit high performance computing infrastructure for statistical analysis of health records, and sensor data from vehicles and road networks.

Stilian Stoev’s research is in the area of applied probability and statistics for stochastic processes with emphasis on extremes, heavy tails, self-similarity, and long-range dependence. His recent theoretical contributions are in the area of max-stable processes, which is the class of processes emerging as a canonical model for the dependence in the extremes. This includes the representation, characterization, ergodicity, mixing, and prediction for this class of processes. Dr. Stoev is also working on applied problems in the area of computer network traffic monitoring, analysis and modeling. A recent joint project focuses on developing efficient statistical methods and algorithms for the visualization and analysis of fast multi-gigabit network traffic streams, which can help unveil the structure of traffic flows, detect anomalies and cyber attacks in real-time. This involves advanced low-level packet capture, efficient computation and rapid communication of summary statistics using non-relational data bases. More broadly, Dr. Stoev’s research is motivated by large-scale and data intensive applied problems arising in the areas of:

- environmental, weather and climate extremes.
- insurance and finance.
- Internet traffic monitoring, modeling and prediction.

My research group is engaged in fundamental research in the following areas: Statistical learning theory: We are developing theory and algorithms for predictions problems (e.g., learning to rank and multilabel learning) with complex label spaces and where the available human supervision is often weak. Sequential prediction in a game theoretic framework: We are trying to understand the power and limitations of sequential predictions algorithms when no probabilistic assumptions are placed on the data generating mechanism. High dimensional and network data analysis: We are developing scalable algorithms with provable performance guarantees for learning from high dimensional and network data. Optimization algorithms: We are creating incremental, distributed and parallel algorithms for machine learning problems arising in today’s data rich world. Reinforcement learning: We are synthesizing concepts and techniques from artificial intelligence, control theory and operations research for pushing the frontier in sequential decision making with a focus on delivering personalized health interventions via mobile devices. My research group is pursuing and continues to actively search for challenging machine learning problems that arise across disciplines including behavioral sciences, computational biology, computational chemistry, learning sciences, and network science.

Roderick Joseph Little, PhD, is the Richard D. Remington Distinguished University Professor of Biostatistics, Professor of Statistics, Research Professor, Institute for Social Research, and Senior Fellow, Michigan Society of Fellows, at the University of Michigan, Ann Arbor.

Prof. Little’s primary research interest is the analysis of data sets with missing values. Many statistical techniques are designed for complete, rectangular data sets, but in practice biostatistical data sets contain missing values, either by design or accident. As detailed in my book with Rubin, initial statistical approaches were relatively ad-hoc, such as discarding incomplete cases or substituting means, but modern methods are increasingly based on models for the data and missing-data mechanism, using likelihood-based inferential techniques.

Another interest is the analysis of data collected by complex sampling designs involving stratification and clustering of units. Since working as a statistician for the World Fertility Survey, I have been interested in the development of model-based methods for survey analysis that are robust to misspecification, reasonably efficient, and capable of implementation in applied settings. Statistics is philosophically fascinating and diverse in application. My inferential philosophy is model-based and Bayesian, although the effects of model misspecification need careful attention. My applied interests are broad, including mental health, demography, environmental statistics, biology, economics and the social sciences as well as biostatistics.

I am broadly interested in statistical inference, which is informally defined as the process of turning data into prediction and understanding. I like to work with richly structured data, such as those extracted from texts, images and other spatiotemporal signals. In recent years I have gravitated toward a field in statistics known as Bayesian nonparametrics, which provides a fertile and powerful mathematical framework for the development of many computational and statistical modeling ideas. My motivation for all this came originally from an early interest in machine learning, which continues to be a major source of research interest. A primary focus of my group’s research in machine learning to develop more effective inference algorithms using stochastic, variational and geometric viewpoints.

Dr. Kalbfleisch is a Professor of Biostatistics and Statistics at the University of Michigan, Ann Arbor. He served as chair of the Department of Biostatistics, School of Public Health, from 2002 to 2007 and as Director of the Kidney Epidemiology and Cost Center from 2008 to 2011. He received his Ph.D. in statistics in 1969 from the University of Waterloo. He was an assistant professor of statistics at the State University of New York at Buffalo (1970-73) and on faculty at the University of Waterloo (1973-2002). At Waterloo, he served as chair of the Department of Statistics and Actuarial Science (1984-1990) and as dean of the faculty of Mathematics (1990-1998). He has held visiting appointments as Professor at the University of Washington, the University of California at San Francisco, the University of Auckland, Fred Hutchinson Cancer Research Center and the National University of Singapore. He has interests in and has publised in various areas of statistics and biostatistics including life history and survival analysis, likelihood methods of inference, bootstrapping and estimating equations, mixture and mixed effects models and medical applications, particularly in the area of renal disease and organ transplantation. Dr. Kalbfleisch is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics. He is also an elected member of the International Statistical Institute, a Fellow of the Royal Society of Canada and a Gold Medalist of the Statistical Society of Canada. He also received the Distinguished Research Award from the UM School of Public Health in 2011.

A primary research interest is in the development of models and methods for analyzing failure time or event history data. Applications of this work arise in many areas including epidemiology, medicine, demography and engineering. In event history data, interest centers on the timing and occurrence of various kinds of events such as, for example, repeated infections or recurrences of disease, or other sequences of events that may occur during a study period. I have been particularly interested in situations in which only partial data or data subject to sampling bias are available.

In recent years, I have been working on statistical aspects of problems associated with End Stage Renal Disease and solid organ transplantation. The Kidney Epidemiology and Cost Center has many projects associated with these including various projects funded through the Centers for Medicare and Medicaid Services. This provides a rich area of application where statistical methods and developments play a substantial role in defining public policy. I am particularly interested in the development of appropriate methods for the use of such data in profiling and/or ranking medical providers.

I have recently worked on the optimization and simulation of kidney paired donation programs. In these, candidates in need of a kidney transplant who have a willing but incompatible living donor are entered into a pool and we seek exchanges of donors to overcome incompatibilities. Added to this is the potential for non-directed donors who can give a kidney to one member of the pool and hence create a chain of transplants. Our methods use integer programming methods to create flexible allocation schemes that have the potential to provide substantial increases in the number of transplants achieved.