The data science institutes at seven universities invite you to join the 2021 Data Science Coast-to-Coast seminar series. These data science institutes have been planning ways to foster a broad-reaching data science community that can collaborate extensively to advance our missions for research, education, and data science for social good. Using Zoom to eliminate geographical constraints, this seminar series is one of our first steps toward the goal.
In the first half of 2021, we will host five seminars, each featuring one faculty member and one postdoctoral fellow from two universities. Each speaker will give a 20-minute talk about ongoing projects and motivating issues, followed by 20 minutes of discussion with the audience. These seminars will be the launching point for follow-on research discussion meetings which will hopefully lead to fruitful collaborative research. We strongly encourage you to sign up ahead of time, and indicate your area of research and your interest in follow-up discussions.
April 21st Presenters:
H. V. Jagadish, Director, Michigan Institute for Data Science; Bernard A Galler Collegiate Professor of Electrical Engineering and Computer Science, University of Michigan
Data Equity: A Core Requirement for Responsible Data Science
It was only recently that we regularly used to hear statements like “Let the data speak for themselves”. Today, we instead hear worries about fairness of data-driven systems and AI. Nevertheless, a focus on a specific formulation of fairness in one data science step is far too narrow to be the whole story. We need to address inequitable representation in the data record, inequities due to the data scientist’s world view being reflected in the model, inequities in the resulting outcomes, and inequities in access to fruits of the analysis. In this talk, I will lay out a research agenda in this direction, and invite you to join me.
Ciera Martinez, Biodiversity and Environmental Sciences Lead, Berkeley Institute for Data Science, University of California – Berkeley
Open science in the wild: principles to build reproducible and collaborative data analysis workflows
The academic research system is not built to incentivize open science practices, but transparency and reproducible methodology allows researchers to critically assess and build upon results to fuel scientific discovery and supports a more collaborative and equitable research community. Open science and data practices are often presented as ideals, but rarely do we train for how to handle the intricacies that emerge from every unique research project life cycle. In this talk I will present the ERP (Explore, Refine, and Produce) workflow – a three-phase data analysis workflow that guides researchers to create reproducible and responsible data analysis workflows. Each phase is centered on how to make decisions based on the audience the research is communicated, the research products created, and the career aspirations of the researchers involved. We hope this work helps create a community of practice for how we design and train for reproducible data intensive research and helps demystify data analysis for both students new to research and current researchers who are new to data-intensive work.