levina-small

Liza Levina

By | | No Comments

Liza Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

gagnonbartsch

Johann Gagnon-Bartsch

By | | No Comments

My research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, I am working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). We are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).