mebane-small

Walter Mebane

By | | No Comments

My primary project, election forensics, concerns using statistical analysis to try to determine whether election results are accurate.  Election forensics methods use data about voters and votes that are as highly disaggregated as possible.  Typically this means polling station (precinct) data, sometimes ballot box data.  Data can comprises hundreds of thousands or millions of observations.  Geographic information is used, with geographic structure being relevant.  Estimation involves complex statistical models.  Frontiers include:  distinguishing frauds from effects of strategic behavior;  estimating frauds probabilities for individual observations (e.g., polling stations);  adjoining nonvoting data such as from in-person election observations.

Hotspot Analysis, Extreme Fraud Probabilities, South Africa, 2014

Hotspot Analysis, Extreme Fraud Probabilities, South Africa, 2014

levina-small

Liza Levina

By | | No Comments

Liza Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

gagnonbartsch

Johann Gagnon-Bartsch

By | | No Comments

My research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, I am working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). We are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

naisyin-wang-small

Naisyin Wang

By | | No Comments

My main research interests involve developing models and methodologies for complex biomedical data. I have developed approaches in information extraction from imperfect data due to measurement errors and incompleteness. My other methodology developments include model-based mixture modeling, non- and semiparametric modeling of longitudinal, dynamic and high dimensional data. I developed approaches that first gauge the effects of measurement errors on non-linear mixed effects models and provided statistical methods to analyze such data. Most methods I have developed are so called semi-parametric based. One strength of such approaches is that one does not need to make certain structure assumptions about part of the model. This modeling strategy enables data integration from measurements collected from sources that might not be completely homogeneous. My recently developed statistical methods focus on regularized approach and model building, selection and evaluation for high dimensional, dynamic or functional data.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.

clayscott

Clayton Scott

By | | No Comments

I study patterns in large, complex data sets, and make quantitative predictions and inferences about those patterns. Problems I’ve worked on include classification, anomaly detection, active and semi-supervised learning, transfer learning, and density estimation. I am primarily interested in developing new algorithms and proving performance guarantees for new and existing algorithms.

Examples of pulses generated from a neutron and a gamma ray interacting with an organic liquid scintillation detector used to detect and classify nuclear sources. Machine learning methods take several such examples and train a classifier to predict the label associated to future observations.

Examples of pulses generated from a neutron and a gamma ray interacting with an organic liquid scintillation detector used to detect and classify nuclear sources. Machine learning methods take several such examples and train a classifier to predict the label associated to future observations.

mahesh

Mahesh Agarwal

By | | No Comments

Mahesh Agarwal is Associate Professor of Mathematics and Statistics at the University of Michigan, Dearborn.

Prof. Agarwal’s is primarily interested in number theory, in particular in p-adic L-functions, Bloch-Kato conjecture and automorphic forms. His secondary research interests are polynomials, geometry and math education.

fred_feinberg_small

Fred Feinberg

By | | No Comments

My research examines how people make choices in uncertain environments. The general focus is on using statistical models to explain complex decision patterns, particularly involving sequential choices among related items (e.g., brands in the same category) and dyads (e.g., people choosing one another in online dating), as well as a variety of applications to problems in the marketing domain (e.g., models relating advertising exposures to awareness and sales). The main methods used lie primarily in discrete choice models, ordinarily estimated using Bayesian methods, dynamic programming, and nonparametrics. I’m particularly interested in extending Bayesian analysis to very large databases, especially in terms of ‘fusing’ data sets with only partly overlapping covariates to enable strong statistical identification of models across them.

Applying Bayesian Methods to Problems in Dynamic Choice

Applying Bayesian Methods to Problems in Dynamic Choice

 

shuhengzhou

Shuheng Zhou

By | | No Comments

In the “Big Data” era, data sets are often very large yet incomplete, high dimensional, and complex in nature. Analyzing and deriving critically useful information from such data poses a great challenge to today’s researchers and practitioners. The overall goal of the research agenda of my group is to develop new theoretical frameworks and algorithms for analyzing such large, complex and spatio-temporal data despite the overwhelming presence of missing values and large additive errors. We propose to develop parametric and nonparametric models and methods for (i) handling challenging situations with additive and multiplicative errors, including missing values, in observed variables; (ii) estimating dynamic time varying correlation and graphical structures; (iii) addressing fundamental challenges in “Big Data” such as data reduction, aggregation, interpretation and scale. We expect to uncover the complex structures and the associated conditional independence relationships from observation data with an ensemble of newly designed estimators. Our methods are applicable to many application domains such as neuroscience, geoscience and spatio-temporal modeling, genomics, and network data analysis.