MICHIGAN INSTITUTE FOR DATA SCIENCE announces 2023 Propelling Original Data Science (PODS) Grant awardees
The Michigan Institute for Data Science (MIDAS) announced the awardees of the 2023 round of Propelling Original Data Science (PODS) Grants. Nine teams will receive funding support for a wide range of exciting projects with data science and Artificial Intelligence (AI) as the common thread, including topics such as multimodal learning for disease prediction, using video data to study political discourse, text analysis to detect and reduce bias in graduate admissions, and explainable AI for building trust in AI-aided decisions. The awarded projects are:
Since 2016, MIDAS has been offering funding to U-M faculty to enable groundbreaking disciplinary and interdisciplinary research through data science and AI, making it possible for research teams to form many new collaborations, formulate groundbreaking ideas, and secure external funding to expand their work. As of 2022, a total of $12M MIDAS funding has jump-started 63 research projects, which expanded into 112 follow-on projects with $114M of external funding. In addition, “year after year, the applicants propose to employ increasingly more sophisticated data science and AI methods to address increasingly more profound research questions,” says Dr. H. V. Jagadish, Director of MIDAS. “This reflects the rapid advancement of data science and AI and their transformation of science and society, and U-M researchers are at the forefront of it.”
The 2023 PODS teams will present their projects at the U-M Annual Data Science and AI Summit to be held November 13-14, 2023. Read more about their projects:
From ground to air, and the traveler experiences in-between: Human-centered data-driven performance measures for multimodal transportation systems
Both ground and air transportation systems have traditionally been assessed using system-based metrics that discount human experiences. While there is growing consensus that the management of these systems should integrate human-centered performance metrics, the primary sources of data to obtain these metrics are difficult to obtain, and the challenges are only increasing. This project aims to examine the potential of applying AI-based approaches to integrate passively collected travel data with rich behavioral insights from smaller scale passenger survey datasets, with the goal of linking across transportation modes and advancing multimodal transportation networks to be more equitable, accessible, and efficient.
A Data Science Toolkit for Examining Local Governance
We will collect a novel, large-scale dataset containing transcripts of city council meetings in Michigan. On top of this data, we will combine domain expertise and machine learning pipelines to generate a rich set of annotations that capture key political qualities of the meeting discourse. This dataset will lay the groundwork for new empirical research on local governance, political division, discourse and civic participation.
Bayesian modeling of multi-source phenology to forecast airborne allergen concentration
We aim to improve the short-term and long-term predictions of airborne allergens under climate change, an emerging public health concern. To achieve this, we propose to develop novel data science tools to effectively assimilate multiple data sources and integrate various data-driven and process-based models. Beyond innovative methodology, our project also advances the biological understanding of pollen and fungal spores, and ultimately, our work helps alleviate the impacts of airborne allergens on people’s health.
Interpretable machine learning to identify tumor spatial features from longitudinal multi-modality images for personalized progression risk prediction of poor prognosis head and neck cancer
Our research project focuses on the development of an interpretable machine learning model designed to efficiently integrate multimodal data, including images and biological information. Our model also identifies crucial tumor changes over time, enabling personalized progression risk prediction for patients with poor prognosis head and neck cancer. This innovative approach aims to enhance the efficacy and precision of radiation therapy for high-risk patients, ultimately resulting in improved treatment outcomes and quality of life.
Maria Masotti (Biostatistics)
MI-SPACE: Multiplex Imaging based Spatial Analysis for Discovery of Cellular Interactions in the Tumor Microenvironment
The tumor microenvironment is emerging as the next frontier in cancer research, where scientists are working to understand how the spatial interplay of multiple cell types surrounding the tumor affects immune response, tumor development, response to treatment, and more. Existing methods to quantify cellular interactions in the tumor microenvironment do not scale to the rapidly evolving technical landscape where researchers are now able to map over fifty cellular markers at the single cell resolution with thousands of cells per image. We will develop a statistically-oriented, scalable framework and software toolkit to help researchers discover novel associations between cellular cross-talk in the tumor microenvironment and patient-level outcomes such as response to treatment or survival.
Nikola Banovic (Computer Science and Engineering)
Detecting and Countering Untrustworthy Artificial Intelligence (AI) through AI Literacy
Distinguishing trustworthy from untrustworthy Artificial Intelligence (AI) is of critical importance to broader societal adoption of AI, as AI gets deployed into high-stakes decision-making scenarios. However, end-users who are not computer-science savvy and who lack AI literacy fail to detect untrustworthy AI, despite existing approaches that attempt to promote AI trustworthiness by explaining and justifying AI decisions. Here, we propose to design and evaluate novel explanation mechanisms to help such end-users develop AI literacy they require to detect and counter untrustworthy AI, and in turn reduce their undue reliance on such AI.
Foundations of Sequence Models for Learning, Estimation, and Control of Dynamical Systems
Powerful sequence models such as transformers have revolutionized natural language processing however their use in dynamic decision making remains unproven and unsafe. This project will unlock the potential of sequence models in data-driven control and enable their safe and robust use through innovative theory and algorithms.
Neural Quantum States at Scale: Applications in Sciences and Engineering
Neural networks have achieved unparalleled performance on a diversity of tasks ranging from image processing to natural language generation. This project will leverage these successes to unravel the mysteries of quantum many-body physics. The project hinges on the idea that the quantum many-body problem can be posed as a machine learning problem for a quantum many-body wave function. By drawing upon state-of-art machine learning techniques, this project will make possible the application of neural-network techniques to quantum many-body problems of unprecedented scale, thereby unlocking a spectrum of applications in physics, chemistry and materials science.
Machine-Processing of Graduate Student Applications for Diversity, Equity, and Inclusion
Every year the UM College of Engineering receives tens of thousands of graduate applications, which faculty reviewers initially down-select using numerical indicators of merit such as GPA, test scores, and undergraduate school prestige. Unless an applicant meets a predefined numerical threshold, richer portions of an application—such as letters of recommendation and statement of purposes—may remain overlooked. This project aims to use Natural Language Processing methods to process graduate student applications and identify ‘hidden gem’ applicants, who are exceptional students from underrepresented or less-privileged backgrounds but have a strong propensity for PhD research.