My research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, I am working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). We are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.
My research examines how people make choices in uncertain environments. The general focus is on using statistical models to explain complex decision patterns, particularly involving sequential choices among related items (e.g., brands in the same category) and dyads (e.g., people choosing one another in online dating), as well as a variety of applications to problems in the marketing domain (e.g., models relating advertising exposures to awareness and sales). The main methods used lie primarily in discrete choice models, ordinarily estimated using Bayesian methods, dynamic programming, and nonparametrics. I’m particularly interested in extending Bayesian analysis to very large databases, especially in terms of ‘fusing’ data sets with only partly overlapping covariates to enable strong statistical identification of models across them.