Jian Kang

By |

Dr. Kang’s research focuses on the developments of statistical methods motivated by biomedical applications with a focus on neuroimaging. His recent key contributions can be summarized in the following three aspects:

Bayesian regression for complex biomedical applications
Dr. Kang and his group developed a series of Bayesian regression methods for the association analysis between the clinical outcome of interests (disease diagnostics, survival time, psychiatry scores) and the potential biomarkers in biomedical applications such as neuroimaging and genomics. In particular, they developed a new class of threshold priors as compelling alternatives to classic continuous shrinkages priors in Bayesian literatures and widely used penalization methods in frequentist literatures. Dr. Kang’s methods can substantially increase the power to detect weak but highly dependent signals by incorporating useful structural information of predictors such as spatial proximity within brain anatomical regions in neuroimaging [Zhao et al 2018; Kang et al 2018, Xue et al 2019] and gene networks in genomics [Cai et al 2017; Cai et al 2019]. Dr Kang’s methods can simultaneously select variables and evaluate the uncertainty of variable selection, as well as make inference on the effect size of the selected variables. His works provide a set of new tools for biomedical researchers to identify important biomarkers using different types of biological knowledge with statistical guarantees. In addition, Dr. Kang’s work is among the first to establish rigorous theoretical justifications for Bayesian spatial variable selection in imaging data analysis [Kang et al 2018] and Bayesian network marker selection in genomics [Cai et al 2019]. Dr. Kang’s theoretical contributions not only offer a deep understanding of the soft-thresholding operator on smooth functions, but also provide insights on which types of the biological knowledge may be useful to improve biomarker detection accuracy.

Prior knowledge guided variable screening for ultrahigh-dimensional data
Dr. Kang and his colleagues developed a series of variable screening methods for ultrahigh-dimensional data analysis by incorporating the useful prior knowledge in biomedical applications including imaging [Kang et al 2017, He et al 2019], survival analysis [Hong et al 2018] and genomics [He et al 2019]. As a preprocessing step for variable selection, variable screening is a fast-computational approach to dimension reduction. Traditional variable screening methods overlook useful prior knowledge and thus the practical performance is unsatisfying in many biomedical applications. To fill this gap, Dr. Kang developed a partition-based ultrahigh-dimensional variable screening method under generalized linear model, which can naturally incorporate the grouping and structural information in biomedical applications. When prior knowledge is unavailable or unreliable, Dr. Kang proposed a data-driven partition screening framework on covariate grouping and investigate its theoretical properties. The two special cases proposed by Dr. Kang: correlation-guided partitioning and spatial location guided partitioning are practically extremely useful for neuroimaging data analysis and genome-wide association analysis. When multiple types of grouping information are available, Dr. Kang proposed a novel theoretically justified strategy for combining screening statistics from various partitioning methods. It provides a very flexible framework for incorporating different types of prior knowledge.

Brain network modeling and inferences
Dr. Kang and his colleagues developed several new statistical methods for brain network modeling and inferences using resting-state fMRI data [Kang et al 2016, Xie and Kang 2017, Chen et al 2018]. Due to the high dimensionality of fMRI data (over 100,000 voxels in a standard brain template) with small sample sizes (hundreds of participants in a typical study), it is extremely challenging to model the brain functional connectivity network at voxel-levels. Some existing methods model brain anatomical region-level networks using the region-level summary statistics computed from voxel-level data. Those methods may suffer low power to detect the signals and have an inflated false positive rate, since the summary statistics may not well capture the heterogeneity within the predefined brain regions. To address those limitations, Dr. Kang proposed a novel method based on multi-attribute canonical correlation graphs [Kang et al 2016] to construct region-level brain network using voxel-level data. His method can capture different types of nonlinear dependence between any two brain regions consisting of hundreds or thousands of voxels. He also developed permutation tests for assessing the significance of the estimated network. His methods can largely increase power to detect signals for small sample size problems. In addition, Dr. Kang and his colleague also developed theoretically justified high-dimensional tests [Xie and Kang 2017] for constructing region-level brain networks using the voxel-level data under the multivariate normal assumption. Their theoretical results provide a useful guidance for the future development of statistical methods and theory for brain network analysis.

 

This image illustrates the neuroimaging meta-analysis data (Kang etal 2014). Neuroimaging meta-analysis is an important tool for finding consistent effects over studies. We develop a Bayesian nonparametric model and perform a meta-analysis of five emotions from 219 studies. In addition, our model can make reverse inference by using the model to predict the emotion type from a newly presented study. Our method outperforms other methods with an average of 80% accuracy.

1. Cai Q, Kang J, Yu T (2020) Bayesian variable selection over large scale networks via the thresholded graph Laplacian Gaussian prior with application to genomics. Bayesian Analysis, In Press (Earlier version won a student paper award from Biometrics Section of the ASA in JSM 2017)
2. He K, Kang J, Hong G, Zhu J, Li Y, Lin H, Xu H, Li Y (2019) Covariance-insured screening. Computational Statistics and Data Analysis: 132, 100—114.
3. He K, Xu H, Kang J† (2019) A selective overview of feature screening methods with applications to neuroimaging data, WRIES Computational Statistics, 11(2) e1454
4. Chen S, Xing Y, Kang J, Kochunov P, Hong LE (2018). Bayesian modeling of dependence in brain connectivity, Biostatistics, In Press.
5. Kang J, Reich BJ, Staicu AM (2018) Scalar-on-image regression via the soft thresholded Gaussian process. Biometrika: 105(1) 165–184.
6. Xue W, Bowman D and Kang J (2018) A Bayesian spatial model to predict disease status using imaging data from various modalities. Frontiers in Neuroscience. 12:184. doi:10.3389/fnins.2018.00184
7. Jin Z*, Kang J†, Yu T (2018) Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations. Bioinformatics, 34(9):1555—1561.
8. He K, Kang J† (2018) Comments on “Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data “. Bayesian Analysis, 13(1) 289-291.
9. Hong GH, Kang J†, Li Y (2018) Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Analysis: 24(1) 45-71.
10. Zhao Y*, Kang J†, Long Q (2018) Bayesian multiresolution variable selection for ultra-high dimensional neuroimaging data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(2):537-550. (Earlier version won student paper award from ASA section on statistical learning and data mining in JSM 2014; It was also ranked as one of the top two papers in the student paper award competition in ASA section on statistics in imaging in JSM 2014)
11. Kang J, Hong GH, Li Y (2017) Partition-based ultrahigh dimensional variable screening, Biometrika, 104(4): 785-800.
12. Xie J#, Kang J# (2017) High dimensional tests for functional networks of brain anatomic regions. Journal of Multivariate Analysis, 156:70-88.
13. Cai Q*, Alvarez JA, Kang J†, Yu T (2017) Network marker selection for untargeted LC/MS metabolomics data, Journal of Proteome Research, 16(3):1261-1269
14. Kang J, Bowman FD, Mayberg H, Liu H (2016) A depression network of functionally connected regions discovered via multi-attribute canonical correlation graphs. NeuroImage, 41:431-441.

Nancy Fleischer

By |

Dr. Fleischer’s research focuses on how the broader socioeconomic and policy environments impact health disparities and the health of vulnerable populations, in the U.S. and around the world. Through this research, her group employs various analytic techniques to examine data at multiple levels (country-level, state-level, and neighborhood-level), emphasizing the role of structural influences on individual health. Her group applies advanced epidemiologic, statistical, and econometric methods to this research, including survey methodology, longitudinal data analysis, hierarchical modeling, causal inference, systems science, and difference-in-difference analysis. Dr. Fleischer leads two NCI-funded projects focused on the impact of tobacco control policies on health equity in the U.S.

Jenny Radesky

By |

My research focuses on the intersection between mobile technology, parenting, parent-child interaction, and child development of processes such as executive functioning, self-regulation, and social-emotional well-being. Our projects use a combination of methods including surveys, videotaped parent-child interaction tasks, time diaries, and mobile device app logging to examine how parents and young children use mobile technologies throughout their day. We have developed novel content analysis approaches to understand the experience of young children while using commercially available mobile apps – including advertising content, educational quality, and data collection. We emphasize questions that are relevant to everyday parenting experiences, and also consider what design changes would help create an optimal default environment for children and families.

Xiaoling Xiang

By |

Xiaoling Xiang conducts community-based services research concerning the physical and mental health and service use of diverse older populations. She is particularly interested in psychosocial approaches to promoting mental health and enhancing the quality of life in older adults. Her other areas of research include the epidemiology of mental disorders in late life, comorbidity, quality of home and community-based services, and implementation of evidence-based interventions. She uses a variety of applied statistical methods in the analysis of data from national surveys, electronic medical records, insurance claims.

Kean Ming Tan

By |

I am an applied statistician working on statistical machine learning methods for analyzing complex biomedical data sets. I develop multivariate statistical methods such as probabilistic graphical models, cluster analysis, discriminant analysis, and dimension reduction to uncover patterns from massive data set. Recently, I also work on topics related to robust statistics, non-convex optimization, and data integration from multiple sources.

Sunghee Lee

By |

My research focuses on issues in data collection with hard-to-reach populations. In particular, she examines 1) nontraditional sampling approaches for minority or stigmatized populations and their statistical properties and 2) measurement error and comparability issues for racial, ethnic and linguistic minorities, which also have implications for cross-cultural research/survey methodology. Most recently, my research has been dedicated to respondent driven sampling that uses existing social networks to recruit participants in both face-to-face and Web data collection settings. I plan to expand my research scope in examining representation issues focusing on the racial/ethnic minority groups in the U.S. in the era of big data.

Todd I Herrenkohl

By |

Before joining the faculty at the University of Michigan in 2018 as Professor and Marion Elizabeth Blue Chair of Children and Families, I was Co-Director of the 3DL Partnership at the University of Washington, where I collaborated with academic colleagues, students, and service providers throughout the state to conduct and translate research on social emotional learning (SEL) and trauma-informed practices. I am now pursuing a similar line of research in Michigan, where I am collaborating with state partners and to identify, develop, and refine new approaches to disseminate research for schools and early childhood settings engaged in SEL and trauma work. As a scholar, I am committed to increasing the visibility, application, and sustainability of evidence-based programs and practices relevant to these topics and have worked extensively in the U.S. and internationally to advance goals for prevention and the promotion of child well-being.

Jin Lu

By |

Dr. Jin Lu is an Assistant Professor of Computer and Information Science at the University of Michigan, Dearborn.
His major research interests include machine learning, data mining, optimization, matrix analysis, biomedical informatics, and health informatics. Two main directions are being pursued:
(1) Large-scale machine learning problems with data heterogeneity. Data heterogeneity is common across many high-impact application domains, ranging from recommendation system to Computer Vision, Bioinformatics and Health-informatics. Such heterogeneity can be present in a variety of forms, including (a) sample heterogeneity, where multiple resources of data samples are available as side information; (b) task heterogeneity, where multiple related learning tasks can be jointly learned to improve the overall performance; (c) view heterogeneity, where complementary information is available from various sources. My research interests focus on building efficient machine learning methods from such data heterogeneity, aiming to improve the learning model by making the best use of all data resources.
(2) Machine learning methods with provable guarantees. Machine learning has been substantially developed and has demonstrated great success in various domains. Despite its practical success, many of the applications involve solving NP-hard problems based on heuristics. It is challenging to analyze whether a heuristic scheme has any theoretical guarantee. My research interest is to employ granular data structure, e.g. sample clusters or features describing an aspect of the sample, to design new theoretically-sound models and algorithms for machine learning problems.