Framework for Integrative Data Equity Systems

Lead Investigators

H V Jagadish

University of Michigan

Maggie Levenstein

University of Michigan

Robert Hampshire

University of Michigan

Bill Howe

University of Washington

Julia Stoyanovich

New York University

Abstract

This institute builds on our team’s work on Foundations of Responsible Data Science under NSF Awards 1740996 and 1902959

Data science continues to have a transformative impact on Science and Engineering, and on society at large, by enabling evidence-based decision making, reducing costs and errors, and improving objectivity. The techniques and technologies of data science also have enormous potential for harm if they reinforce inequity or leak private information. As a result, sensitive datasets in the public and private sector are restricted from research use, slowing progress in those areas that have the most to gain: human services in the public sector. Furthermore, the misuse of data science techniques and technologies will disproportionately harm underrepresented groups across race, gender, physical ability, sexual orientation, education, and more. These data equity issues are pervasive, and represent an existential risk for the use of data-driven methods in science and engineering. This project will establish a Framework for Integrative Data Equity Systems (FIDES): a National Institute for the study of systems that enable research on sensitive data while preventing misuse and misinterpretation.

Learn More

The first workshop on Frameworks for Integrative Data Systems (FIDES) and Foundations of Responsible Data Science (FORDS) was held remotely via Zoom on March 25 and 26, 2020

NSF Workshop on Frameworks for Integrative Data Equity Systems (FIDES) and Foundations of Responsible Data Science (FORDS)

(FIDES Award #1934565) (FORDS Award #1902959)

ALL TIMES ARE EDT (Eastern Daylight Time)

March 25th (Wednesday)

  • 11:00am Julia Stoyanovich: Welcome View Slides
  • 11:15am Solon Barocas: Keynote
  • 12:00pm Breakout overview and charge
  • 12:15pm Breakout Session 1: Warmup, Framing, and Examples
  • 1:00pm Break (Organizers will synthesize/cluster ideas)
  • 1:30pm Organizer’s Recap of Breakout 1
  • 1:45pm Breakout Session 2: Sources and Impacts of Inequity
  • 3:15pm Break
  • 3:45pm Breakout Session 3: Integrative Data Systems
  • 5:15pm Recap
  • 5:30pm Adjourn

March 26th (Thursday)

  • 11:00am Welcome back; H.V. Jagadish: Keynote View Keynote Slides
  • 11:30am Breakout Session 2: Report and Discussion(30min) 5 minute report-back per group, co-leads speak
    (20min) Plenary discussion, moderated by organizers
  • 12:20pm Breakout Session 3: Report and Discussion(30min) 5 minute report-bback per group, co-leads speak
    (20min) Plenary Discussion, moderated by organizers
  • 1:10pm Break
  • 1:40pm Panel: Ashley Casovan (AI Global), Stefaan Verhulst (GovLab), Jenny Yang (The Urban Institute), Robert Cheetham (Azavea), Maggie Levenstein (UMich / ICPSR):What has been said that you can support with examples from your work?
    What important angles are missing?
  • 3:00pm Break
  • 3:30pm Breakout Session 4: Application Domain Exercises in breakout groups(5 minutes of Breakout charge, followed by 55 minutes of discussion)
  • 4:30pm Recap(30min) Report back from Breakout Session 4 application exercises
    (15min) Plenary discussion, moderated by organizers
  • 5:15pm Adjourn
View Workshop Report

FIDES will enable interdisciplinary community convergence around data equity systems, with an initial study in critical domains such as mobility, housing, education, economic indicators, and government transparency, leading to the development of a novel data analytics infrastructure that supports responsibility in integrative data science. Towards this goal, the project will address several technically challenging problems:

  1. To be able to use data from multiple sources, risks related to privacy, bias, and the potential for misuse must be addressed. This project will develop principled methods for dataset processing to overcome these concerns.
  2. Individual datasets are difficult to integrate for use in advanced multi-layer network models. This project considers methods to create pre-trained tensors over large collections of spatially and temporally coherent datasets, making them easier to incorporate while controlling for fairness and equity.
  3. Any dataset or model must be equipped with sufficient information to determine fitness for use, communicate limitations, and describe underlying assumptions. This project will develop tools and techniques to produce “nutritional labels” for data and models, formalizing and standardizing ad hoc metadata approaches to provenance, specialized for equity issues. In addition to supporting methodological innovation in data science, the Institute will become a focal point for sharing expertise in data equity systems. It will do so by establishing interfaces for interaction between data science and domain experts to promote expertise development and sharing of best practices, and by consistently supporting efforts on diversity and equity.

An Example “Nutritional Label” for a Ranking of CS Departments in USA.

The first (green) panel shows that this ranking heavily depends on publication counts.

The second (orange) panel shows that a diverse set of departments is considered, big and small, and from all parts of the country.

The remaining (blue) panels examine properties of this ranking. The ranking is stable, but may not be fair to small departments.