My methodological research focus on developing statistical methods for routinely collected healthcare databases such as electronic health records (EHR) and claims data. I aim to tackle the unique challenges that arise from the secondary use of real-world data for research purposes. Specifically, I develop novel causal inference methods and semiparametric efficiency theory that harness the full potential of EHR data to address comparative effectiveness and safety questions. I develop scalable and automated pipelines for curation and harmonization of EHR data across healthcare systems and coding systems.
Our laboratory focuses on (1) the biology of cancer metastasis, especially bone metastasis, including the role of the host microenvironment; and (2) mechanisms of chemoresistance. We explore for genes that regulate metastasis and the interaction between the host microenvironment and cancer cells. We are performing single cell multiomics and spatial analysis to enable us to identify rare cell populations and promote precision medicine. Our research methodology uses a combination of molecular, cellular, and animal studies. The majority of our work is highly translational to provide clinical relevance to our work. In terms of data science, we collaborate on applications of both established and novel methodologies to analyze high dimensional; deconvolution of high dimensional data into a cellular and tissue context; spatial mapping of multiomic data; and heterogenous data integration.
My research interests are in natural language semantics and psycholinguistics, focusing on verbs. I conduct behavioral psycholinguistic experiments with methodologies such as self-paced reading and maze tasks, as well as surveys of linguistic and semantic judgments. I also study semantic variation using corpora and datasets such as the Twitter Decahose, to better understand how words have developed diverging meanings in different communities, age groups, or regions. I use primarily R and Python to collect, manage, and analyze data. I direct the UM WordLab in the linguistics department, working with students (especially undergraduates) on experimental and computational research focusing on lexical representations.
We are interested in resolving outstanding fundamental scientific problems that impede the computational materials design process. Our group uses high-throughput density functional theory, applied thermodynamics, and materials informatics to deepen our fundamental understanding of synthesis-structure-property relationships, while exploring new chemical spaces for functional technological materials. These research interests are driven by the practical goal of the U.S. Materials Genome Initiative to accelerate materials discovery, but whose resolution requires basic fundamental research in synthesis science, inorganic chemistry, and materials thermodynamics.
Study of Pandemic Publishing: How Scholarly Literature is Affected by COVID-19 Pandemic
This project addresses the quality of recently published COVID-19 publications. With the COVID-19 pandemic, researchers publish a lot their research as preprints. And while preprints are an important development in scholarly publishing, they are works in progress that need further refinement to become a more rigorous final product. Scholarly publishers are also taking initiatives to accelerate publication process, for example, by asking reviewers to curtail requests for additional experiments upon revisions. Sacrificing rigor for haste inevitably increases the likelihood of article correction and retraction, leading to spread of false information within supposedly trustworthy sources that have a peer-reviewing process in place to ensure proper verification. I study the quality of COVID-19 related scholarly works by using CADRE’s datasets to identify signs of incoherency, irreproducibility, and haste.
We have developed and tested machine learning approaches to integrate quantitative markers for diagnosis and assessment of progression of TMJ OA, as well as extended the capabilities of 3D Slicer4 into web-based tools and disseminated open source image analysis tools. Our aims use data processing and in-depth analytics combined with learning using privileged information, integrated feature selection, and testing the performance of longitudinal risk predictors. Our long term goals are to improve diagnosis and risk prediction of TemporoMandibular Osteoarthritis in future multicenter studies.
The Spectrum of Data Science for Diagnosis of Osteoarthritis of the Temporomandibular Joint
As a board-certified ophthalmologist and glaucoma specialist, I have more than 15 years of clinical experience caring for patients with different types and complexities of glaucoma. In addition to my clinical experience, as a health services researcher, I have developed experience and expertise in several disciplines including performing analyses using large health care claims databases to study utilization and outcomes of patients with ocular diseases, racial and other disparities in eye care, associations between systemic conditions or medication use and ocular diseases. I have learned the nuances of various data sources and ways to maximize our use of these data sources to answer important and timely questions. Leveraging my background in HSR with new skills in bioinformatics and precision medicine, over the past 2-3 years I have been developing and growing the Sight Outcomes Research Collaborative (SOURCE) repository, a powerful tool that researchers can tap into to study patients with ocular diseases. My team and I have spent countless hours devising ways of extracting electronic health record data from Clarity, cleaning and de-identifying the data, and making it linkable to ocular diagnostic test data (OCT, HVF, biometry) and non-clinical data. Now that we have successfully developed such a resource here at Kellogg, I am now collaborating with colleagues at > 2 dozen academic ophthalmology departments across the country to assist them with extracting their data in the same format and sending it to Kellogg so that we can pool the data and make it accessible to researchers at all of the participating centers for research and quality improvement studies. I am also actively exploring ways to integrate data from SOURCE into deep learning and artificial intelligence algorithms, making use of SOURCE data for genotype-phenotype association studies and development of polygenic risk scores for common ocular diseases, capturing patient-reported outcome data for the majority of eye care recipients, enhancing visualization of the data on easy-to-access dashboards to aid in quality improvement initiatives, and making use of the data to enhance quality of care, safety, efficiency of care delivery, and to improve clinical operations. .
Most of my research related to data science involves decision making around clinical trials. In particular, I am interested in how databases of past clinical trial results can inform future trial design and other decisions. Some of my work has involved using machine learning and mathematical optimization to design new combination therapies for cancer based on the results of past trials. Other work has used network meta-analysis to combine the results of randomized controlled trials (RCTs) to better summarize what is currently known about a disease, to design further trials that would be maximally informative, and to study the quality of the control arms used in Phase III trials (which are used for drug approvals). Other work combines toxicity data from clinical trials with toxicity data from other data sources (claims data and adverse event reporting databases) to accelerate detection of adverse drug reactions to newly approved drugs. Lastly, some of my work uses Bayesian inference to accelerate clinical trials with multiple endpoints, learning the link between different endpoints using past clinical trial results.