U-M Annual Data Science & AI Summit 2023

November 13-14, 2023
Rackham Building, 915 E. Washington St., Ann Arbor

Propelling Original Data Science (PODS) Showcase

November 13, 2023

The MIDAS Propelling Original Data Science (PODS) grant strongly encourages works that transform research domains through data science and AI, works that improve the reproducibility of research, and works that promise major impact and potential for significant expansion.

Detecting and Countering Untrustworthy Artificial Intelligence (AI) through AI Literacy

Nikola Banovic (Computer Science and Engineering)

Distinguishing trustworthy from untrustworthy Artificial Intelligence (AI) is of critical importance to broader societal adoption of AI, as AI gets deployed into high-stakes decision-making scenarios. However, end-users who are not computer-science savvy and who lack AI literacy fail to detect untrustworthy AI, despite existing approaches that attempt to promote AI trustworthiness by explaining and justifying AI decisions. Here, we propose to design and evaluate novel explanation mechanisms to help such end-users develop AI literacy they require to detect and counter untrustworthy AI, and in turn reduce their undue reliance on such AI.

Machine-Processing of Graduate Student Applications for Diversity, Equity, and Inclusion

Wenhao Sun (Materials Science and Engineering), Dallas Card (School of Information)

Every year the UM College of Engineering receives tens of thousands of graduate applications, which faculty reviewers initially down-select using numerical indicators of merit such as GPA, test scores, and undergraduate school prestige. Unless an applicant meets a predefined numerical threshold, richer portions of an application—such as letters of recommendation and statement of purposes—may remain overlooked. This project aims to use Natural Language Processing methods to process graduate student applications and identify ‘hidden gem’ applicants, who are exceptional students from underrepresented or less-privileged backgrounds but have a strong propensity for PhD research.

A Data Science Toolkit for Examining Local Governance

Justine Zhang (School of Information), Yanna Krupnikov (Communications and Media)

We will collect a novel, large-scale dataset containing transcripts of city council meetings in Michigan. On top of this data, we will combine domain expertise and machine learning pipelines to generate a rich set of annotations that capture key political qualities of the meeting discourse. This dataset will lay the groundwork for new empirical research on local governance, political division, discourse and civic participation.

From ground to air, and the traveler experiences in-between: Human-centered data-driven performance measures for multimodal transportation systems

Atiyya Shaw (Civil and Environmental Engineering), Max Li (Aerospace Engineering; Industrial and Operations Engineering)

Both ground and air transportation systems have traditionally been assessed using system-based metrics that discount human experiences. While there is growing consensus that the management of these systems should integrate human-centered performance metrics, the primary sources of data to obtain these metrics are difficult to obtain, and the challenges are only increasing. This project aims to examine the potential of applying AI-based approaches to integrate passively collected travel data with rich behavioral insights from smaller scale passenger survey datasets, with the goal of linking across transportation modes and advancing multimodal transportation networks to be more equitable, accessible, and efficient.

Interpretable machine learning to identify tumor spatial features from longitudinal multi-modality images for personalized progression risk prediction of poor prognosis head and neck cancer

Lise Wei (Radiation Oncology), Liyue Shen (Computer Science and Engineering)

Our research project focuses on the development of an interpretable machine learning model designed to efficiently integrate multimodal data, including images and biological information. Our model also identifies crucial tumor changes over time, enabling personalized progression risk prediction for patients with poor prognosis head and neck cancer. This innovative approach aims to enhance the efficacy and precision of radiation therapy for high-risk patients, ultimately resulting in improved treatment outcomes and quality of life.

MI-SPACE: Multiplex Imaging based Spatial Analysis for Discovery of Cellular Interactions in the Tumor Microenvironment

Maria Masotti (Biostatistics)

The tumor microenvironment is emerging as the next frontier in cancer research, where scientists are working to understand how the spatial interplay of multiple cell types surrounding the tumor affects immune response, tumor development, response to treatment, and more. Existing methods to quantify cellular interactions in the tumor microenvironment do not scale to the rapidly evolving technical landscape where researchers are now able to map over fifty cellular markers at the single cell resolution with thousands of cells per image. We will develop a statistically-oriented, scalable framework and software toolkit to help researchers discover novel associations between cellular cross-talk in the tumor microenvironment and patient-level outcomes such as response to treatment or survival.

Foundations of Sequence Models for Learning, Estimation, and Control of Dynamical Systems

Samet Oymak (Electrical and Computer Engineering) Necmiye Ozay (Electrical Engineering and Computer Science; Robotics)

Powerful sequence models such as transformers have revolutionized natural language processing however their use in dynamic decision making remains unproven and unsafe. This project will unlock the potential of sequence models in data-driven control and enable their safe and robust use through innovative theory and algorithms.

Neural Quantum States at Scale: Applications in Sciences and Engineering

Shravan Veerapaneni (Mathematics) James Stokes (Mathematics)

Neural networks have achieved unparalleled performance on a diversity of tasks ranging from image processing to natural language generation. This project will leverage these successes to unravel the mysteries of quantum many-body physics. The project hinges on the idea that the quantum many-body problem can be posed as a machine learning problem for a quantum many-body wave function. By drawing upon state-of-art machine learning techniques, this project will make possible the application of neural-network techniques to quantum many-body problems of unprecedented scale, thereby unlocking a spectrum of applications in physics, chemistry and materials science.

Bayesian modeling of multi-source phenology to forecast airborne allergen concentration

Kai Zhu, (School for Environment and Sustainability), Kerby Shedden, (Statistics)

We aim to improve the short-term and long-term predictions of airborne allergens under climate change, an emerging public health concern. To achieve this, we propose to develop novel data science tools to effectively assimilate multiple data sources and integrate various data-driven and process-based models. Beyond innovative methodology, our project also advances the biological understanding of pollen and fungal spores, and ultimately, our work helps alleviate the impacts of airborne allergens on people’s health.