My research is primarily focused around 1) machine learning methods for understanding healthcare delivery and outcomes in the population, 2) analyses of correlated data (e.g. longitudinal and clustered data), and 3) survival analysis and competing risks analyses. We have developed tree-based and ensemble regression methods for censored and multilevel data, combination classifiers using different types of learning methods, and methodology to identify representative trees from an ensemble. These methods have been applied to important areas of biomedicine, specifically in patient prognostication, in developing clinical decision-making tools, and in identifying complex interactions between patient, provider, and health systems for understanding variations in healthcare utilization and delivery. My substantive areas of research are cancer and pediatric cardiovascular disease.
Zhenke Wu is an Assistant Professor of Biostatistics, and a core faculty member in the Michigan Institute of Data Science (MIDAS). He received his Ph.D. in Biostatistics from the Johns Hopkins University in 2014 and then stayed at Hopkins for his postdoctoral training before joining the University of Michigan. Dr. Wu’s research focuses on the design and application of statistical methods that inform health decisions made by individuals, or precision medicine. The original methods and software developed by Dr. Wu are now used by investigators from research institutes such as CDC and Johns Hopkins, as well as site investigators from developing countries, e.g., Kenya, South Africa, Gambia, Mali, Zambia, Thailand and Bangladesh.
Profile: At a “sweet spot” of data science
By Dan Meisler
Communications Manager, ARC
If you had to name two of the more exciting, emerging fields of data science, electronic health records (EHR) and mobile health might be near the top of the list.
Zhenke Wu, one of the newest MIDAS core faculty members, has one foot firmly in each field.
“These two fields share the common goal of learning from the experience of the population in the past to advance health and clinical decisions for those to follow. I am looking forward to more work that will bring the two fields closer to continuously generate insights about human health.” Wu said. “I’m in a sweet spot.”
Wu joined U-M in Fall 2016, after earning a PhD in Biostatistics from Johns Hopkins University, and a bachelor’s in Mathematics from Fudan University. He said the multitude of large-scale studies going on at U-M and access to EHR databases were factors in his coming to Michigan.
“The University of Michigan is an exciting place that has a diversity of large-scale databases and supportive research groups in the fields I’m interested in,” he said.
Wu is collaborating with the Michigan Genomics Initiative, which is a biorepository effort at Michigan Medicine to integrate genome-wide information with EHR from approximately 40,000 patients undergoing anesthesia prior to surgery or diagnostic procedures. He’s also collaborating with Dr. Srijan Sen, Associate Professor, Department of Psychiatry and Molecular and Behavioral Neuroscience Institute, on the MIDAS-supported project “Identifying Real-Time Data Predictors of Stress and Depression Using Mobile Technology,” the preliminary results of which recently matured into an NIH-funded R01 project “Mobile Technology to Identify Mechanisms Linking Genetic Variation and Depression” that will draw broad expertise from a multi-disciplinary team of medical and data science researchers.
“One of my goals is to use an integrated and rigorous approach to predict how a person’s health status will be in the near future,” Wu said.
Wu applies hierarchical Bayesian models to these problems, which he hopes will shed light on phenomena he describes as latent constructs that are “well-known, but less quantitatively understood, e.g., intelligence quotient (IQ) in psychology.”
As another example, he cites the current challenge in active surveillance of prostate cancer patients for aggressive tumors requiring removal and/or radiation, or indolent tumors permitting continued surveillance.
“The underlying status of aggressive versus indolent cancer is not observed, which needs to be learned from the results of biopsy and other clinical measurements,” he said. “The decisions and experience of urologists and their patients will greatly benefit from more accurate understanding of the tumor status… There are lots of scientific problems in clinical, biomedical, behavioral and social sciences where you have well-known but less quantitatively understood latent constructs. These are problems that Bayesian latent variable methods can formulate and address.”
Just as Wu has a hand in two hot-button big data areas, he also sees himself as straddling the line between application and methodology.
He says the large number of data sources — sensors, mobile apps, test results, and questionnaires, to name just a few — results in richness as well as some “messiness” that needs new methodologies to adjust, integrate and translate to new scientific insights. At the same time, a valid new methodology for dealing with, for example, electronic health data, will likely find numerous different applications.
Wu says his approach was heavily influenced by his work in the Pneumonia Etiology Research for Child Health (PERCH) funded by the Gates Foundation while he was at Johns Hopkins. Pneumonia is a clinical syndrome due to lung infection that can be caused by more than 30 different species of pathogens, including bacteria, viruses and fungi. The goal of the seven-country study that enrolled more than 5,000 cases and 5,000 controls from Africa and Southeast Asia is to estimate the frequency with which each pathogen caused pneumonia in the population and the probability of each individual being infected by the list of pathogens in the lung.
“In most settings, it is extremely difficult to identify the pathogen by directly sampling from the site of infection – the child’s lung. PERCH therefore looked for other sources of evidence by standardizing and comprehensively testing biofluids collected from sites peripheral to the lung. Using hierarchical Bayesian models to infer disease etiology by integrating such a large trove of data was extremely fun and exciting”, he said.
Wu’s initial interest in math, leading to biostatistics and now data science, stems from what he called a “greedy” desire to learn the guiding principles of how the world works by rigorous data science.
“If you have new problems, you can wait for other people to ask a clean math question, or you can go work with these messy problems and figure out interesting questions and their answers,” he said.
For more on Dr. Wu, see his profile on Michigan Experts.
Brenda Gillespie, PhD, is Associate Director in Consulting for Statistics, Computing and Analytics Research (CSCAR) with a secondary appointment as Associate Research Professor in the department of Biostatistics in the School of Public Health at the University of Michigan, Ann Arbor. She provides statistical collaboration and support for numerous research projects at the University of Michigan. She teaches Biostatistics courses as well as CSCAR short courses in survival analysis, regression analysis, sample size calculation, generalized linear models, meta-analysis, and statistical ethics. Her major areas of expertise are clinical trials and survival analysis.
Prof. Gillespie’s research interests are in the area of censored data and clinical trials. One research interest concerns the application of categorical regression models to the case of censored survival data. This technique is useful in modeling the hazard function (instead of treating it as a nuisance parameter, as in Cox proportional hazards regression), or in the situation where time-related interactions (i.e., non-proportional hazards) are present. An investigation comparing various categorical modeling strategies is currently in progress.
Another area of interest is the analysis of cross-over trials with censored data. Brenda has developed (with M. Feingold) a set of nonparametric methods for testing and estimation in this setting. Our methods out-perform previous methods in most cases.
Bhramar Mukherjee is a Professor in the Department of Biostatistics, joining the department in Fall, 2006. Bhramar is also a Professor in the Department of Epidemiology. Bhramar completed her Ph.D. in 2001 from Purdue University. Bhramar’s principal research interests lie in Bayesian methods in epidemiology and studies of gene-environment interaction. She is also interested in modeling missingness in exposure, categorical data models, Bayesian nonparametrics, and the general area of statistical inference under outcome/exposure dependent sampling schemes. Bhramar’s methodological research is funded by NSF and NIH. Bhramar is involved as a co-investigator in several R01s led by faculty in Internal Medicine, Epidemiology and Environment Health sciences at UM. Her collaborative interests focus on genetic and environmental epidemiology, ranging from investigating the genetic architecture of colorectal cancer in relation to environmental exposures to studies of air pollution on pediatric Asthma events in Detroit. She is actively engaged in Global Health Research.
Sebastian Zöllner is a Professor of Biostatistics. He also holds an appointment in the Department of Psychiatry. Dr. Zöllner joined the University of Michigan after a postdoctoral fellowship in the Department of Human Genetics at the University of Chicago. His research effort is divided between generating new methods in statistical genetics and analyzing data. The general thrust of his work is problems from human genetics, evolutionary biology and statistical population biology.
Andrzej Galecki, MD, PhD, is Research Professor in the department of Biostatistics, School of Public Health, and Research Professor in the Institute of Gerontology at the University of Michigan, Ann Arbor.
Dr. Raghunathan’s primary research interest is in developing methods for dealing with missing data in sample surveys and in epidemiological studies. The methods are motivated from a Bayesian perspective but with desirable frequency or repeated sampling properties. The analysis of incomplete data from practical sample surveys poses additional problems due to extensive stratification, clustering of units and unequal probabilities of selection. The model-based approach provides a framework to incorporate all the relevant sampling design features in dealing with unit and item nonresponse in sample surveys. There are important computational challenges in implementing these methods in practical surveys. He has developed SAS based software, IVEware, for performing multiple imputation analysis and the analysis of complex survey data. Raghunathan’s other research interests include Bayesian methods, methods for small area estimation, combining information from multiple surveys, measurement error models, longitudinal data analysis, privacy, confidentiality and disclosure limitations and statistical methods for epidemiological studies. His applied interests include cardiovascular epidemiology, social epidemiology, health disparity, health care utilization, and social and economic sciences. Raghunathan is also involved in the Survey Methodology Program at the Institute for Social Research, a multidisciplinary team of sociologists, statisticians and psychologists, provides an opportunity to address methodological issues in: nonresponse, interviewer behavior and its impact on the results, response or measurement bias and errors, noncoverage, respondent cognition, privacy and confidentiality issues and data archiving. The Survey Methodology Program has a graduate program offering masters and doctoral degrees in survey methodology.
Yi Li is a Professor of Biostatistics and Director of the Kidney Epidemiology and Cost Center. His current research interests are survival analysis, longitudinal and correlated data analysis, measurement error problems, spatial models and clinical trial designs. He is developing methodologies for analyzing large-scale andhigh-dimensional datasets, with direct applications inobservational studies as well in genetics/genomics. His methodologic research is funded by various federal grants starting from year 2003. Yi Li is actively involved in collaborative research in clinical trials and observational studies with researchers from the University of Michigan and Harvard University. The applications have included chronic kidney disease surveillance, organ transplantation, cancer preventive studies and cancer genomics.
Matthew Schipper, PhD, is Assistant Professor in the Departments of Radiation Oncology and Biostatistics. He received his Ph.D. in Biostatistics from the University of Michigan in 2006. Prior to joining the Radiation Oncology department he was a Research Investigator in the Department of Radiology at the University of Michigan and a consulting statistician at Innovative Analytics.
Prof. Schipper’s research interests include:
- Use of Biomarkers to Individualize Treatment – Selection of dose for cancer patients treated with Radiation Therapy (RT) must balance the increased efficacy with the increased toxicity associated with higher dose. Historically, a single dose has been selected for a population of patients (e.g. all stage III NSC lung cancer). However, the availability of new biologic markers for toxicity and efficacy allow the possibility of selecting a more personalized dose. I am interested in using statistical models for toxicity and efficacy as a function of RT dose and biomarkers to select an optimal dose for an individual patient. We are studying quantitative methods based on utilities to make this efficacy/toxicity tradeoff explicit and quantitative when biomarkers for one or multiple outcomes are available. We have proposed a simulation based method for studying the likely effects of any model or marker based dose selection on both toxicity and efficacy outcomes for a population of patients. In related projects, we are studying the role of correlation between the sensitivity of a patient’ tumor and normal tissues to radiation. We are also studying how to utilize these techniques in combination with baseline and/or mid-treatment adaptive image guided RT.
- Early Phase Oncology Study Design – An increasingly common feature of phase I designs is the inclusion of 1 or more dose expansion cohorts (DECs) in which the MTD is first estimated using a 3+3 or other Phase I design and then a fixed number (often 10-20 in 1-10 cohorts) of patients are treated at the dose initially estimated to be the MTD. Such an approach has not been studied statistically or compared to alternative designs. We have shown that a CRM design, in which the dose-assignment mechanism is kept active for all patients, more accurately identifies the MTD and protects the safety of trial patients than a similarly sized DEC trial. It also meets the objective of treating 15 or more patients at the final estimated MTD. A follow-up paper evaluating the role of DECs with a focus on efficacy estimation is in press at Annals of Oncology.
My research focuses on developing statistical methods and software tools for the analysis of human genetic data and application of those methods to understand the genetic basis of human health and disease. Our methods and tools are used by statisticians and geneticists worldwide. My disease research is focused on type 2 diabetes (T2D) and related traits and on bipolar disorder and schizophrenia. Our studies are generating and analyzing genome or exome sequence data on 10,000s of individuals, requiring the efficient handling of petabyte-scale data.