3:45pm – Opening Remarks: Announcing MIDAS 3.0 Strategic Focus (Virtual) – View Recording
Dr. H.V. Jagadish
Director, Michigan Institute for Data Science
Dr. Fiebrink’s research focuses on human-computer interaction, machine learning, and signal processing all to allow people to apply machine learning to new areas such as designing new musical instruments or gestural interfaces for accessibility. She is also involved in digital humanities scholarship and machine learning education.
Mini Workshops (In-person, Michigan League)
9:00am – 11:00am
Hussey Room – Introduction to data visualization on the web with D3.js
Kalamazoo Room – Diversity and equity in data science – providing technical solutions and empowering the workforce
Michigan Room – Developing best practices for reproducible data science
Vandenberg Room – Using text as data: Introduction to machine learning for natural language processing
Organizer and Instructor:
Fred Feng, Assistant Professor, Industrial and Manufacturing Systems Engineering
Presenters and panelists:
Lia Corrales, Assistant Professor, Astronomy; Founder of Women of Color Coders
Tayo Fabusuyi, Assistant Research Scientist, U-M Transportation Research Institute; Leading the project “Towards a more representative Public Interest Technology (PIT) field”
H.V. Jagadish, Director, MIDAS; Professor, Computer Science and Engineering; Leading the project “Framework for Integrative Data Equity Systems”
Rada Mihalcea, Director, U-M AI Lab; Professor, Computer Science and Engineering; Developing technical solutions to detect and correct bias in data and algorithms, supporting women in technology.
Target audience: people who are involved or are interested in promoting equity and diversity both from the technical perspective and from the community perspective.
9 – 9:45 am: Panelist presentations. 1) Increasing diversity and inclusion in the data science and AI research community, as exemplified by Women of Color Code and Public Interest Technology. 2) Developing technical solutions to make the data we deal with, and the decisions made with these data, more equitable and inclusive.
9:55 – 11 am: Community forum, where participants will share their work and their thoughts on these topics, with conversation moderated by the workshop panelists.
The goal of the session is to convene like-minded people and stimulate ideas for collaborative efforts. Through this session, attendees will learn about similar activities on campus, share their ideas and activities, and get to know like-minded colleagues for collaboration. The ideal outcome of the session is a concrete plan to intensify current efforts through collaboration.
Jing Liu, Managing Director, MIDAS
Brandon Butler, Sharon Glotzer research group, Chemical Engineering: Flexible and reproducible workflows through the signac framework
Johann Gagnon Bartsch, Assistant Professor, Statistics: Building reproducible workflows in multiple platforms
Thomas Valley, Assistant Professor, Pulmonary and Critical Care Medicine: Creating a culture of code review in health care research
Target audience: Researchers who would like to learn about best practices in code review and sharing, and reproducible workflows. It is also for researchers who are interested in participating in the MIDAS Reproducibility Challenge.
The 2020 MIDAS Reproducibility Challenge highlighted important conceptual issues of reproducible data science in multiple dimensions and the creative practical approaches U-M researchers have used to address these challenges. Three winners of the Challenge will present their practical approaches in the first half of the workshop. The 2021 Challenge focuses on actionable solutions that can be shared with other researchers to improve reproducibility. In the second half of the workshop, the presenters will discuss with the audience the conceptual issues and practical solutions for making data science research transparent, traceable, and trustworthy, and answer questions for researchers who are interested in participating in the 2021 Reproducibility Challenge.
Organizers and Instructors:
Meghan Dailey, Machine Learning Specialist, Advanced Research Computing
Jule Krüger, Program Manager for Big Data and Data Science, Center for Political Studies, and Advanced Research Computing
Target audience: Anyone who is interested in the topic. A basic familiarity with Python or R is expected for the second half of the workshop.
In this workshop, we will analyze a text corpus to demonstrate the use of machine learning for natural language processing. In the first half of the workshop, we will provide a basic overview of machine learning, introduce the main concepts and logic of using text as data, and walk through a typical workflow for processing, managing and analyzing a text corpus. We will discuss how to choose between Python and R for text analysis and how to interpret the results from a topic model. In the second half of the workshop, instructors will demonstrate in two concurrent hands-on tutorials how the topic modelling example from the first half was accomplished in either Python or R. Participants who attend the first part of the workshop will walk away with a basic overview of the capabilities and methods for using text as data. Participants who attend the entire workshop will be equipped with basic programming tools to apply natural language processing in their own research. The workshop will also cover helpful resources for machine learning implementations, such as data sets, storage space, high performance computing, and consultation services at the University of Michigan.