My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.
Professor Kowalski’s recent research analyzes experiments and clinical trials with the goal of designing policies to target insurance expansions and medical treatments to individuals who stand to benefit from them the most. Her research has also explored the impact of previous Medicaid expansions, the Affordable Care Act, the Massachusetts health reform of 2006, and employer-sponsored health insurance plans. She has also used cutting-edge techniques to estimate the value of medical spending on at-risk newborns.
We are interested in resolving outstanding fundamental scientific problems that impede the computational materials design process. Our group uses high-throughput density functional theory, applied thermodynamics, and materials informatics to deepen our fundamental understanding of synthesis-structure-property relationships, while exploring new chemical spaces for functional technological materials. These research interests are driven by the practical goal of the U.S. Materials Genome Initiative to accelerate materials discovery, but whose resolution requires basic fundamental research in synthesis science, inorganic chemistry, and materials thermodynamics.
My primary research is focused on measurement and monitoring of risks in banks, both at the individual bank level and at the level of financial system as a whole. In a recent paper, we have developed a high-dimension statistical approach to measure connectivity across different players in the financial sector. We implement our model using stock return data for US banks, insurance companies and hedge funds. Some of my early research has developed analytical tools to measure banks’ default risk using option pricing models and other tools of financial economics. These projects have often a significant empirical component that uses large financial datasets and econometric tools. Of late, I have been working on several projects related to the issue of equity and inclusion in financial markets. These papers use large datasets from financial markets to understand differences in the quantity and quality of financial services received by minority borrowers. A common theme across these projects is the issue of causal inference using state-of-the art tools from econometrics. Finally, some of ongoing research projects are related to FinTech with a focus on credit scoring and online lending.
Dr. Kang’s research focuses on the developments of statistical methods motivated by biomedical applications with a focus on neuroimaging. His recent key contributions can be summarized in the following three aspects:
Bayesian regression for complex biomedical applications
Dr. Kang and his group developed a series of Bayesian regression methods for the association analysis between the clinical outcome of interests (disease diagnostics, survival time, psychiatry scores) and the potential biomarkers in biomedical applications such as neuroimaging and genomics. In particular, they developed a new class of threshold priors as compelling alternatives to classic continuous shrinkages priors in Bayesian literatures and widely used penalization methods in frequentist literatures. Dr. Kang’s methods can substantially increase the power to detect weak but highly dependent signals by incorporating useful structural information of predictors such as spatial proximity within brain anatomical regions in neuroimaging [Zhao et al 2018; Kang et al 2018, Xue et al 2019] and gene networks in genomics [Cai et al 2017; Cai et al 2019]. Dr Kang’s methods can simultaneously select variables and evaluate the uncertainty of variable selection, as well as make inference on the effect size of the selected variables. His works provide a set of new tools for biomedical researchers to identify important biomarkers using different types of biological knowledge with statistical guarantees. In addition, Dr. Kang’s work is among the first to establish rigorous theoretical justifications for Bayesian spatial variable selection in imaging data analysis [Kang et al 2018] and Bayesian network marker selection in genomics [Cai et al 2019]. Dr. Kang’s theoretical contributions not only offer a deep understanding of the soft-thresholding operator on smooth functions, but also provide insights on which types of the biological knowledge may be useful to improve biomarker detection accuracy.
Prior knowledge guided variable screening for ultrahigh-dimensional data
Dr. Kang and his colleagues developed a series of variable screening methods for ultrahigh-dimensional data analysis by incorporating the useful prior knowledge in biomedical applications including imaging [Kang et al 2017, He et al 2019], survival analysis [Hong et al 2018] and genomics [He et al 2019]. As a preprocessing step for variable selection, variable screening is a fast-computational approach to dimension reduction. Traditional variable screening methods overlook useful prior knowledge and thus the practical performance is unsatisfying in many biomedical applications. To fill this gap, Dr. Kang developed a partition-based ultrahigh-dimensional variable screening method under generalized linear model, which can naturally incorporate the grouping and structural information in biomedical applications. When prior knowledge is unavailable or unreliable, Dr. Kang proposed a data-driven partition screening framework on covariate grouping and investigate its theoretical properties. The two special cases proposed by Dr. Kang: correlation-guided partitioning and spatial location guided partitioning are practically extremely useful for neuroimaging data analysis and genome-wide association analysis. When multiple types of grouping information are available, Dr. Kang proposed a novel theoretically justified strategy for combining screening statistics from various partitioning methods. It provides a very flexible framework for incorporating different types of prior knowledge.
Brain network modeling and inferences
Dr. Kang and his colleagues developed several new statistical methods for brain network modeling and inferences using resting-state fMRI data [Kang et al 2016, Xie and Kang 2017, Chen et al 2018]. Due to the high dimensionality of fMRI data (over 100,000 voxels in a standard brain template) with small sample sizes (hundreds of participants in a typical study), it is extremely challenging to model the brain functional connectivity network at voxel-levels. Some existing methods model brain anatomical region-level networks using the region-level summary statistics computed from voxel-level data. Those methods may suffer low power to detect the signals and have an inflated false positive rate, since the summary statistics may not well capture the heterogeneity within the predefined brain regions. To address those limitations, Dr. Kang proposed a novel method based on multi-attribute canonical correlation graphs [Kang et al 2016] to construct region-level brain network using voxel-level data. His method can capture different types of nonlinear dependence between any two brain regions consisting of hundreds or thousands of voxels. He also developed permutation tests for assessing the significance of the estimated network. His methods can largely increase power to detect signals for small sample size problems. In addition, Dr. Kang and his colleague also developed theoretically justified high-dimensional tests [Xie and Kang 2017] for constructing region-level brain networks using the voxel-level data under the multivariate normal assumption. Their theoretical results provide a useful guidance for the future development of statistical methods and theory for brain network analysis.
This image illustrates the neuroimaging meta-analysis data (Kang etal 2014). Neuroimaging meta-analysis is an important tool for finding consistent effects over studies. We develop a Bayesian nonparametric model and perform a meta-analysis of five emotions from 219 studies. In addition, our model can make reverse inference by using the model to predict the emotion type from a newly presented study. Our method outperforms other methods with an average of 80% accuracy.
1. Cai Q, Kang J, Yu T (2020) Bayesian variable selection over large scale networks via the thresholded graph Laplacian Gaussian prior with application to genomics. Bayesian Analysis, In Press (Earlier version won a student paper award from Biometrics Section of the ASA in JSM 2017)
2. He K, Kang J, Hong G, Zhu J, Li Y, Lin H, Xu H, Li Y (2019) Covariance-insured screening. Computational Statistics and Data Analysis: 132, 100—114.
3. He K, Xu H, Kang J† (2019) A selective overview of feature screening methods with applications to neuroimaging data, WRIES Computational Statistics, 11(2) e1454
4. Chen S, Xing Y, Kang J, Kochunov P, Hong LE (2018). Bayesian modeling of dependence in brain connectivity, Biostatistics, In Press.
5. Kang J, Reich BJ, Staicu AM (2018) Scalar-on-image regression via the soft thresholded Gaussian process. Biometrika: 105(1) 165–184.
6. Xue W, Bowman D and Kang J (2018) A Bayesian spatial model to predict disease status using imaging data from various modalities. Frontiers in Neuroscience. 12:184. doi:10.3389/fnins.2018.00184
7. Jin Z*, Kang J†, Yu T (2018) Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations. Bioinformatics, 34(9):1555—1561.
8. He K, Kang J† (2018) Comments on “Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data “. Bayesian Analysis, 13(1) 289-291.
9. Hong GH, Kang J†, Li Y (2018) Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Analysis: 24(1) 45-71.
10. Zhao Y*, Kang J†, Long Q (2018) Bayesian multiresolution variable selection for ultra-high dimensional neuroimaging data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(2):537-550. (Earlier version won student paper award from ASA section on statistical learning and data mining in JSM 2014; It was also ranked as one of the top two papers in the student paper award competition in ASA section on statistics in imaging in JSM 2014)
11. Kang J, Hong GH, Li Y (2017) Partition-based ultrahigh dimensional variable screening, Biometrika, 104(4): 785-800.
12. Xie J#, Kang J# (2017) High dimensional tests for functional networks of brain anatomic regions. Journal of Multivariate Analysis, 156:70-88.
13. Cai Q*, Alvarez JA, Kang J†, Yu T (2017) Network marker selection for untargeted LC/MS metabolomics data, Journal of Proteome Research, 16(3):1261-1269
14. Kang J, Bowman FD, Mayberg H, Liu H (2016) A depression network of functionally connected regions discovered via multi-attribute canonical correlation graphs. NeuroImage, 41:431-441.
My research focuses on the causes, dynamics and outcomes of conflict, at the international and local levels. My methodological areas of interest include spatial statistics, mathematical/computational modeling and text analysis.
Map/time-series/network plot, showing the flow of information across battles in World War II. Z axis is time, X and Y axes are longitude and latitude, polygons are locations of battles, red lines are network edges linking battles involving the same combatants. Source: https://doi.org/10.1017/S0020818318000358
Dr. Eisenberg studies infectious disease epidemiology with a focus on waterborne pathogens. His expertise are in the areas of water sanitation and hygiene (WASH), quantitative microbial risk assessment (QMRA) and disease transmission modeling. Dr. Eisenberg has a long-standing research platform in northern coastal Ecuador, examining how changes in the social and natural environments, mediated by road construction, affect the epidemiology of enteric pathogens. Specific studies focus on enteric pathogens, antimicrobial resistance, the microbiome and dengue. He is also The NIGMS consortium, Models of Infectious Disease Agent Study (MIDAS), to examine mechanisms of transmission and potential intervention and control of enteric pathogens through water and sanitation interventions.
I use machine-learning techniques to implement decision support systems and tools that facilitate more personalized care for disease management and healthcare utilization to ultimately deliver efficient, effective, and equitable therapy for chronic diseases. To test and advance these general principles, I have built operational programs that are guiding—and improving—patient care in costly in low resource settings, including emerging countries.
Dr. Valley’s research focuses on understanding and improving decision-making in the intensive care unit (ICU). His primary line of research seeks to identify the patients most likely to benefit from intensive care, allowing clinicians to safely triage patients between the ICU and the general ward. Ultimately, he hopes to identify ICU-based therapies that can be transferred to the general ward to improve hospital efficiency and reduce healthcare costs. Dr. Valley’s research interests also include enhancing diagnosis in critical illness, improving the ICU experience for family members of ICU patients, and reducing barriers to cost-effective pulmonary and critical care.
My main interest is theoretical statistics as implied to complex model from semiparametric to ultra high dimensional regression analysis. In particular the negative aspects of Bayesian and causal analysis as implemented in modern statistics.
An analysis of the position of SCOTUS judges.