Jeffrey Morris, Ph.D.
Professor, Deputy Chair Ad Interim—Department of Biostatistics
The University of Texas MD Anderson Cancer Center
Bayesian Quantile Functional Regression for Biomedical Imaging Data
Abstract: In many areas of science, technological advances have led to devices that produce an enormous number of measurements per subject. Frequently, researchers deal with these data by extracting summary statistics from these data (e.g. mean or variance) and then modeling those, but this approach can miss key insights when the summaries do not capture all of the relevant information in the raw data. One of the key challenges in modern statistics is to devise methods that can extract information from these big data while avoiding reductionist assumptions. In this talk, we will discuss methods for modeling the entire distribution of the measurements observed for each subject and relating properties of the distribution to covariates. Our approach is to represent the observed data as an empirical quantile function for each subject, and then regress these quantile functions on a set of scalar predictors, an approach we call quantile functional regression. We introduce custom basis functions called “quantlets” to represent the quantile functions that are orthogonal and empirically defined, so adaptive to the features of the given data set. After fitting the quantile functional regression, we are able to perform global tests for which covariates have an effect on any aspect of the distribution, and then follow that up with local tests to characterize these differences, identifying at which quantiles the differences lie and/or assessing whether the covariate affects certain major aspects of the distribution, including location, scale, skewness, or Gaussian-ness while accounting for multiple testing. If the differences lie in these commonly used summaries, our method can still detect them, but our method will not miss effects on aspects of the distribution outside of these summaries. We illustrate this method on biomedical imaging data for which we relate the distribution of pixel intensities to various demographic and clinical characteristics, but the method has wide-ranging application to many areas including climate modeling, genomics, electronic medical records, and wearable computing devices. Time allowing, I will also provide some illustrations of these methods applied in these areas of application.
Light refreshments for seminar guests will be served at 3:00 p.m. in 3755