U-M Annual Data Science & AI Summit 2022

Research Talks

Session Chairs:

  • 11/14: Walter Dempsey, Assistant Professor of Biostatistics
  • 11/15: Lu Wang, Assistant Professor of Computer Science and Engineering

Research Talks: Monday, November 14

Interdisciplinary Research on Suicide Risk: Bridging the Distance Between Data and Inference

Briana Mezuk, Department of Epidemiology, School of Public Health, University of Michigan
Viktoryia Kalesnikava, Department of Epidemiology, School of Public Health, University of Michigan
Linh Dang, Department of Epidemiology, School of Public Health, University of Michigan
Eskira Kahsay, Department of Epidemiology, School of Public Health, University of Michigan
Lily Johns, Department of Epidemiology, School of Public Health, University of Michigan
David Jurgens, School of Information, University of Michigan
Aparna Ananthasubramaniam, School of Information, University of Michigan

Suicide remains the 10th leading cause of death in the US. Despite decades of research, persistent gaps in understanding modifiable predictors of suicide that may inform prevention efforts remain. In response to this challenge, we aim to foster cross-disciplinary research and dialogue on suicidal behavior.

In 2003, CDC has launched the National Violent Death Reporting System (NVDRS), a comprehensive mortality surveillance system that collects salient information on suicide and other violent deaths across the US; this registry now includes over 350,000 deaths from suicide or undetermined intent. A distinct feature of the NVDRS is the inclusion of rich textual data that describe the circumstances (e.g., recent events, ongoing stressors) for most cases in the registry. These textual narratives (median character length= 545, min-max: 1-11936) are abstracted from official source documents from law enforcement and coroner/medical examiners, and contain case details that are only partially captured by other variables in the registry.

In this talk, we will discuss our ongoing research with the NVDRS data, which aims to 1) leverage narrative texts using data science and natural language processing tools, 2) identify novel suicide-related contextual features on a population scale, and 3) investigate how salient life transitions (i.e., changes in employment, relationships, housing, etc.) may relate to suicide at various life stages. We will present findings that speak to each of these elements and share encountered methodological challenges around narrative sparseness and systematic variation in the narrative length (e.g., by age, sex, educational attainment, and race of the decedent). The overall goal of this talk is to foster research dialogue and partnerships around best practices for applying emergent data science tools to identify novel correlates of suicide risk in a manner that accounts for potential biases in data collection and measurement in an equitable manner.

Personalized Treatment Assignment Rules for Vaccine Uptake in Behavioral Science Field Experiments with Large Multi-Arm Trials

Rahul Ladhania, School of Public Health, University of Michigan
Lyle Ungar, University of Pennsylvania
Wenbo Wu, New York University
Nina Mazar, Boston University

Behavioral science offers some inexpensive, scalable strategies that can increase vaccination, yet most studies focus on identifying interventions which, on average, have the highest treatment effects. Without meaningful consideration of heterogeneity in treatment effects, however, there is risk of finding policies that perpetuate or exacerbate disparities. Recent developments in the field of machine learning and econometrics have brought data-driven heterogeneity estimation and personalization to the forefront. How effective is data-driven personalization of behavioral text messaging interventions to increase flu shot uptake and, from a health equity perspective, what role, if any, does race and racial bias play?

We use data from two mega-studies (Milkman et al. 2021, 2022) in three different settings (Walmart Pharmacy, a large multinational retail corporation with over 4,700 pharmacy locations across the US, with ~680,000 participants; The University of Pennsylvania and Geisinger Health Systems, two large health systems in the Northeastern United States, with ~50,000 participants), which tested the efficacy of an array of text messaging nudges encouraging actual flu shot uptake. First, we find that ML-driven personalization can make a substantial difference (upto 3X) in the effectiveness of behavioral messaging interventions for increasing flu shot uptake, over assigning all participants to the on-average best performing arms. Second, we find that gains from personalization in both settings are largely similar across racial groups in our setting. We are extending our models on data from other behavioral studies aimed at increasing COVID-19 vaccine uptake (Dai et al, 2021), assessing the transferability of inference across the two settings.

Development of Understandable Artificial Intelligence (UAI) Methods in Physical Sciences

Y Z (Yang Zhang), Department of Nuclear Engineering and Radiological Sciences, University of Michigan

Despite the booming applications of AI/ML/DS methods in almost every field, one enduring challenge is the lack of explainability with the present approaches. Not being able to interpret the black-box computer models with human-understandable knowledge greatly hinders our trust and the deployment of them. Therefore, the development of Understandable/eXplainable/interpretable Artificial Intelligence (UAI/XAI) is considered as one of the main challenges. Physics and broader physical sciences provide established ground truths and thus can serve as testbeds for the development new UAI methods.

To stimulate discussions, I will briefly describe one example of our research, where we used algebraic geometry tools, namely Morse-Smale complex and sublevelset persistence homology, to produce human-understandable interpretations of autoencoder-learned collective variables in atomistic trajectories. The goal of this talk is to brainstorm and foster collaboration opportunities.

Data Science Without Data Collection Using FedScale

Mosharaf Chowdhury, Department of Computer Science and Engineering, University of Michigan

Although cloud computing has so far successfully accommodated the volume, velocity, and variety of Big Data, collecting everything into the cloud is becoming increasingly infeasible. Today, we face a new set of challenges. A growing awareness of privacy among individual users and governing bodies is forcing platform providers to restrict the variety of data we can collect. Often, we cannot transfer data to the cloud at the velocity of its generation. Many cloud users suffer from sticker shock, buyer's remorse, or both as they try to keep up with the volume of data they must process. Making sense of data closer to its home is more appealing than ever.

In this talk, I will briefly introduce FedScale, a scalable and extensible open-source federated data science platform that we are building in Michigan to tackle these new challenges. FedScale provides high-level APIs for data scientists to implement their tasks, a modular design to customize implementations for diverse hardware and software backends, and the ease of deploying the same code at many scales. FedScale also includes a comprehensive benchmark that allows data scientists to evaluate their ideas in realistic, large-scale settings.

FedScale is available here.

Apply AI to Ionosphere Space Weather Specification and Forecast

Shasha Zou, Department of Climate and Space Sciences and Engineering, University of Michigan
Zihan Wang, Department of Climate and Space Sciences and Engineering, University of Michigan
Yang Chen, Department of Statistics, University of Michigan
Hu Sun, Department of Statistics, University of Michigan

There has been a growing awareness of space weather impacts on critical infrastructure in the civilian, commercial, and military sectors in recent years. To protect critical assets on the ground and in space, multiple federal agencies combined force and constructed the National Space Weather Strategy and Action Plan (NSWSAP). Understanding the underlying physical processes of space weather and improving the specification and forecast is a major objective of the space community. Ionospheric disturbance is highlighted as one of the five major space weather threats in the NSWSAP report.

In this presentation, I will talk about integrate modern ionosphere total electron content (TEC) dataset derived from multiple Global Navigation Satellite System (GNSS) and state-of-the-art machine learning (ML) algorithms to resolve outstanding fundamental questions of the specification and forecasting local and global ionospheric TEC and its variability.

Research Talks: Tuesday, November 15

Using Machine Learning to Construct Hedonic Price Indices

Matthew D. Shapiro, Survey Research Center and Economics Department, University of Michigan

We demonstrate a machine learning (ML) procedure to estimate hedonic price indices at scale from item-level transaction and product characteristics data. Our procedure incorporates state-of-the-art approaches from hedonic econometrics into a ML framework. Applying our methodology to the Nielsen Retail Scanner data set, we estimate a large hedonic adjustment to the Tornqvist index for food product groups, which reduces cumulative inflation over the period 2006q4--2015q4 by more than half. These results suggest that quality improvement via product turnover is important even in product groups that are not normally considered to feature rapid technological progress.

Use AI to Facilitate Emotional Intelligence in Remote Work

Xuan Lu, School of Information, University of Michigan
Wei Ai, College of Information Studies, University of Maryland
Zhenpeng Chen, Peking University
Yanbin Cao, Peking University
Qiaozhu Mei, School of Information, University of Michigan

Emotions at work have long been identified as critical signals of work motivations, status, and attitudes, and as predictors of various work-related outcomes. When more and more employees work remotely, these emotional and mental health signals of workers become harder to observe through daily, face-to-face communications.

The use of online platforms to communicate and collaborate at work provides an alternative channel to monitor the emotions of workers. This paper studies how emojis, as non-verbal cues in online communications, can be used for such purposes. In particular, we study how the developers on GitHub use emojis in their work-related activities. We show that developers have diverse patterns of emoji usage, which highly correlate to their working status including activity levels, types of work, types of communications, time management, and other behavioral patterns. Developers who use emojis in their posts are significantly less likely to dropout from the online work platform. Surprisingly, solely using emoji usage as features, standard machine learning models can predict future dropouts of developers at a satisfactory accuracy.

Understanding the mechanism of the correlations and the predictive power of emojis requires a comprehensive understanding of emoji usage in multiple remote work contexts, which calls for theories and methodologies from disciplines such as organizational behavior and psychology. This work can also be generalized to studies of mental health issues in remote work and online education. What are the purposes of using emojis in different scenarios? What kinds of effects do emojis make in work-related communications? What’s the relation between emoji usage and workers’ mental status, and how to verify it? More generally, what kind of research questions can emojis help to answer in different research domains? Cross-disciplinary collaborations would help address such questions.

Societal Biases in Fairy Tales Across Cultures

Winston Wu, Department of Electrical Engineering and Computer Science, University of Michigan
Lu Wang, Department of Electrical Engineering and Computer Science, University of Michigan
Rada Mihalcea, Department of Electrical Engineering and Computer Science, University of Michigan

Fairy tales are one of the most important cultural and social influences on children's lives. Stereotypes contained in these fairy tales have the potential to influence the rest of our lives. The study of biases in children's stories and fairy tales has largely been limited to a handful of languages around the world. In this study, we investigate over 850 fairy tales across 22 different cultures, identifying and characterizing differences in stereotypes, such as gender bias and agency in events, that may have been instilled at a young age.

Improving Students' Ability to Engage with Scholarly STEM Literature via Effective Reading Practices and Novel Machine Learning-Based Tools

Kevyn Collins-Thompson, School of Information, University of Michigan
Yulia Sevryugina, University of Michigan Library

Our project’s overall research aim is to explore effective methods for addressing readability difficulties that science, technology, engineering, and math (STEM) field students experience when reading scholarly sources. Toward that goal, we are investigating pedagogical practices and resources for effective STEM reading together with novel machine learning-based technologies that support better reading comprehension and retention. Example of the latter include personalized measures of reading difficulty for advanced STEM content, deep learning approaches for finding text passages that are most helpful for learning the meaning of a target concept, and eye-tracking-based predictors that can analyze word-level reading patterns across a document for gaining insight into the ease or difficulty of a text passage.

The field we specifically focus on is biochemistry, the branch of science that explores the chemical processes within and related to living organisms. It brings together biology and chemistry and is at the core of many engineering solutions, not to mention its particular importance during the current coronavirus pandemic. We are actively working with undergraduate and graduate level courses in the Department of Chemistry, where we gather new datasets from classroom reading assignments and user interaction studies in order to comprehend the complex taxonomy of biochemical knowledge, and how to understand and support student engagement with scientific literature. Our presentation will summarize our recent work in progress and early results.