My work lies in the learning, control, and design of autonomous systems with an emphasis on connected automated vehicles (CAVs). I have been committed to developing robust autonomous vehicles, augmented reality (AR) technology, and V2X systems at Mcity. The highlights include: (1) a robust self-driving algorithm/software stack enabling high-level CAVs; (2) a data-and-AI-driven sensor-level augmented reality (AR) system for efficient safe CAV tests. These systems have been deployed on the Mcity CAV fleet and Mcity testing track for daily operations. I am interested in using big naturalistic human-driving data to train motion planning and control algorithms of self-driving cars, so the automated cars could behave with better roadmanship and thus higher acceptance. I am also interested in data-driven low-uncertainty learning algorithms for object detection, tracking, and fusion, in order to build the perception system of safety-critical autonomous systems.
STEPHAN F. TAYLOR is a professor of psychiatry and Associate Chair for Research and Research Regulatory Affairs in the Department of Psychiatry; and an adjunct professor of psychology.
His work uses brain mapping and brain stimulation to study and treat serious mental disorders such as psychosis, refractory depression and obsessive-compulsive disorder. Data science techniques area applied in the analysis of high dimensional functional magnetic resonance imaging datasets and meso-scale brain networks, using supervised and unsupervised techniques to interrogate brain-behavior correlations relevant for psychopathological conditions. Clinical-translation work with brain stimulation, primarily with transcranial magnetic stimulation, is informed by mapping meso-scale networks to guide treatment of conditions such as depression. Future work seeks to use machine learning to identify treatment predictors and match individual patients to specific treatments.
Our laboratory focuses on (1) the biology of cancer metastasis, especially bone metastasis, including the role of the host microenvironment; and (2) mechanisms of chemoresistance. We explore for genes that regulate metastasis and the interaction between the host microenvironment and cancer cells. We are performing single cell multiomics and spatial analysis to enable us to identify rare cell populations and promote precision medicine. Our research methodology uses a combination of molecular, cellular, and animal studies. The majority of our work is highly translational to provide clinical relevance to our work. In terms of data science, we collaborate on applications of both established and novel methodologies to analyze high dimensional; deconvolution of high dimensional data into a cellular and tissue context; spatial mapping of multiomic data; and heterogenous data integration.
We have developed and tested machine learning approaches to integrate quantitative markers for diagnosis and assessment of progression of TMJ OA, as well as extended the capabilities of 3D Slicer4 into web-based tools and disseminated open source image analysis tools. Our aims use data processing and in-depth analytics combined with learning using privileged information, integrated feature selection, and testing the performance of longitudinal risk predictors. Our long term goals are to improve diagnosis and risk prediction of TemporoMandibular Osteoarthritis in future multicenter studies.
The Spectrum of Data Science for Diagnosis of Osteoarthritis of the Temporomandibular Joint
Dr. Kang’s research focuses on the developments of statistical methods motivated by biomedical applications with a focus on neuroimaging. His recent key contributions can be summarized in the following three aspects:
Bayesian regression for complex biomedical applications
Dr. Kang and his group developed a series of Bayesian regression methods for the association analysis between the clinical outcome of interests (disease diagnostics, survival time, psychiatry scores) and the potential biomarkers in biomedical applications such as neuroimaging and genomics. In particular, they developed a new class of threshold priors as compelling alternatives to classic continuous shrinkages priors in Bayesian literatures and widely used penalization methods in frequentist literatures. Dr. Kang’s methods can substantially increase the power to detect weak but highly dependent signals by incorporating useful structural information of predictors such as spatial proximity within brain anatomical regions in neuroimaging [Zhao et al 2018; Kang et al 2018, Xue et al 2019] and gene networks in genomics [Cai et al 2017; Cai et al 2019]. Dr Kang’s methods can simultaneously select variables and evaluate the uncertainty of variable selection, as well as make inference on the effect size of the selected variables. His works provide a set of new tools for biomedical researchers to identify important biomarkers using different types of biological knowledge with statistical guarantees. In addition, Dr. Kang’s work is among the first to establish rigorous theoretical justifications for Bayesian spatial variable selection in imaging data analysis [Kang et al 2018] and Bayesian network marker selection in genomics [Cai et al 2019]. Dr. Kang’s theoretical contributions not only offer a deep understanding of the soft-thresholding operator on smooth functions, but also provide insights on which types of the biological knowledge may be useful to improve biomarker detection accuracy.
Prior knowledge guided variable screening for ultrahigh-dimensional data
Dr. Kang and his colleagues developed a series of variable screening methods for ultrahigh-dimensional data analysis by incorporating the useful prior knowledge in biomedical applications including imaging [Kang et al 2017, He et al 2019], survival analysis [Hong et al 2018] and genomics [He et al 2019]. As a preprocessing step for variable selection, variable screening is a fast-computational approach to dimension reduction. Traditional variable screening methods overlook useful prior knowledge and thus the practical performance is unsatisfying in many biomedical applications. To fill this gap, Dr. Kang developed a partition-based ultrahigh-dimensional variable screening method under generalized linear model, which can naturally incorporate the grouping and structural information in biomedical applications. When prior knowledge is unavailable or unreliable, Dr. Kang proposed a data-driven partition screening framework on covariate grouping and investigate its theoretical properties. The two special cases proposed by Dr. Kang: correlation-guided partitioning and spatial location guided partitioning are practically extremely useful for neuroimaging data analysis and genome-wide association analysis. When multiple types of grouping information are available, Dr. Kang proposed a novel theoretically justified strategy for combining screening statistics from various partitioning methods. It provides a very flexible framework for incorporating different types of prior knowledge.
Brain network modeling and inferences
Dr. Kang and his colleagues developed several new statistical methods for brain network modeling and inferences using resting-state fMRI data [Kang et al 2016, Xie and Kang 2017, Chen et al 2018]. Due to the high dimensionality of fMRI data (over 100,000 voxels in a standard brain template) with small sample sizes (hundreds of participants in a typical study), it is extremely challenging to model the brain functional connectivity network at voxel-levels. Some existing methods model brain anatomical region-level networks using the region-level summary statistics computed from voxel-level data. Those methods may suffer low power to detect the signals and have an inflated false positive rate, since the summary statistics may not well capture the heterogeneity within the predefined brain regions. To address those limitations, Dr. Kang proposed a novel method based on multi-attribute canonical correlation graphs [Kang et al 2016] to construct region-level brain network using voxel-level data. His method can capture different types of nonlinear dependence between any two brain regions consisting of hundreds or thousands of voxels. He also developed permutation tests for assessing the significance of the estimated network. His methods can largely increase power to detect signals for small sample size problems. In addition, Dr. Kang and his colleague also developed theoretically justified high-dimensional tests [Xie and Kang 2017] for constructing region-level brain networks using the voxel-level data under the multivariate normal assumption. Their theoretical results provide a useful guidance for the future development of statistical methods and theory for brain network analysis.
This image illustrates the neuroimaging meta-analysis data (Kang etal 2014). Neuroimaging meta-analysis is an important tool for finding consistent effects over studies. We develop a Bayesian nonparametric model and perform a meta-analysis of five emotions from 219 studies. In addition, our model can make reverse inference by using the model to predict the emotion type from a newly presented study. Our method outperforms other methods with an average of 80% accuracy.
1. Cai Q, Kang J, Yu T (2020) Bayesian variable selection over large scale networks via the thresholded graph Laplacian Gaussian prior with application to genomics. Bayesian Analysis, In Press (Earlier version won a student paper award from Biometrics Section of the ASA in JSM 2017)
2. He K, Kang J, Hong G, Zhu J, Li Y, Lin H, Xu H, Li Y (2019) Covariance-insured screening. Computational Statistics and Data Analysis: 132, 100—114.
3. He K, Xu H, Kang J† (2019) A selective overview of feature screening methods with applications to neuroimaging data, WRIES Computational Statistics, 11(2) e1454
4. Chen S, Xing Y, Kang J, Kochunov P, Hong LE (2018). Bayesian modeling of dependence in brain connectivity, Biostatistics, In Press.
5. Kang J, Reich BJ, Staicu AM (2018) Scalar-on-image regression via the soft thresholded Gaussian process. Biometrika: 105(1) 165–184.
6. Xue W, Bowman D and Kang J (2018) A Bayesian spatial model to predict disease status using imaging data from various modalities. Frontiers in Neuroscience. 12:184. doi:10.3389/fnins.2018.00184
7. Jin Z*, Kang J†, Yu T (2018) Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations. Bioinformatics, 34(9):1555—1561.
8. He K, Kang J† (2018) Comments on “Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data “. Bayesian Analysis, 13(1) 289-291.
9. Hong GH, Kang J†, Li Y (2018) Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Analysis: 24(1) 45-71.
10. Zhao Y*, Kang J†, Long Q (2018) Bayesian multiresolution variable selection for ultra-high dimensional neuroimaging data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(2):537-550. (Earlier version won student paper award from ASA section on statistical learning and data mining in JSM 2014; It was also ranked as one of the top two papers in the student paper award competition in ASA section on statistics in imaging in JSM 2014)
11. Kang J, Hong GH, Li Y (2017) Partition-based ultrahigh dimensional variable screening, Biometrika, 104(4): 785-800.
12. Xie J#, Kang J# (2017) High dimensional tests for functional networks of brain anatomic regions. Journal of Multivariate Analysis, 156:70-88.
13. Cai Q*, Alvarez JA, Kang J†, Yu T (2017) Network marker selection for untargeted LC/MS metabolomics data, Journal of Proteome Research, 16(3):1261-1269
14. Kang J, Bowman FD, Mayberg H, Liu H (2016) A depression network of functionally connected regions discovered via multi-attribute canonical correlation graphs. NeuroImage, 41:431-441.
My research is focused on a wide range of topics from computational social sciences to bioinformatics where I do pattern recognition, perform data analysis, and build prediction models. At the core of my effort, there lie machine learning methods by which I have been trying to address problems related to social networks, opinion mining, biomarker discovery, pharmacovigilance, drug repositioning, security analytics, genomics, food contamination, and concussion recovery. I’m particularly interested in and eager to collaborate on cyber security aspect of social media analytics that includes but not limited to misinformation, bots, and fake news. In addition, I’m still pursuing opportunities in bioinformatics, especially about next generation sequencing analysis that can be also leveraged for phenotype predictions by using machine learning methods.
A typical pipeline for developing and evaluating a prediction models to identify malicious Android mobile apps in the market
My research involves developing novel data collection strategies and image reconstruction techniques for Magnetic Resonance Imaging. In order to accelerate data collection, we take advantage of features of MRI data, including sparsity, spatiotemporal correlations, and adherence to underlying physics; each of these properties can be leveraged to reduce the amount of data required to generate an image and thus speed up imaging time. We also seek to understand what image information is essential for radiologists in order to optimize MRI data collection and personalize the imaging protocol for each patient. We deploy machine learning algorithms and optimization techniques in each of these projects. In some of our work, we can generate the data that we need to train and test our algorithms using numerical simulations. In other portions, we seek to utilize clinical images, prospectively collected MRI data, or MRI protocol information in order to refine our techniques.
We seek to develop technologies like cardiac Magnetic Resonance Fingerprinting (cMRF), which can be used to efficiently collect multiple forms of information to distinguish healthy and diseased tissue using MRI. By using rapid methods like cMRF, quantitative data describing disease processes can be gathered quickly, enabling more and sicker patients can be assessed via MRI. These data, collected from many patients over time, can also be used to further refine MRI technologies for the assessment of specific diseases in a tailored, patient-specific manner.
My research interests are in the areas of brain-inspired machine intelligence and its applications such as mobile robots and autonomous vehicles. To achieve true machine intelligence, I have taken two different approaches: bottom-up data-driven and top-down theory-driven approach. For the bottom-up data-driven approach, I have investigated the neuronal structure of the brain to understand its function. The development of a high-throughput and high-resolution 3D tissue scanner was a keystone of this approach. This tissue scanner has a 3D virtual microscope that allows us to investigate the neuronal structure of a whole mammalian brain in a high resolution. The top-down theory-driven approach is to study what true machine intelligence is and how it can be implemented. True intelligence cannot be investigated without embracing the theory-driven approach such as self-awareness, embodiment, consciousness, and computational modeling. I have studied the internal dynamics of a neural system to investigate the self-awareness of a machine and model neural signal delay compensation. These two meet in the middle where machine intelligence is implemented for mechanical systems such as mobile robots and autonomous vehicles. I have a strong desire to bridge the bottom-up and top-down approaches that lead me to conduct research focusing on mobile robotics and autonomous vehicles to combine the data-driven and theory-driven approaches.
Dr. Soroushmehr’s research interests include the design and development of image processing methods applicable to computer-assisted clinical decision support systems, algorithm design and optimization.
Kentaro Toyama is W. K. Kellogg Professor of Community Information at the University of Michigan School of Information and a fellow of the Dalai Lama Center for Ethics and Transformative Values at MIT. He is the author of “Geek Heresy: Rescuing Social Change from the Cult of Technology.” Toyama conducts interdisciplinary research to understand how the world’s low-income communities interact with digital technology and to invent new ways for technology to support their socio-economic development, including computer simulations of complex systems for policy-making. Previously, Toyama did research in artificial intelligence, computer vision, and human-computer interaction at Microsoft and taught mathematics at Ashesi University in Ghana.