Data Pillar: Measuring and Improving Society
Societal transformations have complicated traditional survey methods for data collection, while a plethora of new data sources are creating opportunities to measure human behaviors and the human social condition. MIDAS supports the development and use of data science and AI methods to better understand society through new data types such as text, video, sensor and digital trace data. Current activities include:
Natural Language Processing (NLP) / Text as Data
Overview: MIDAS is working on strengthening the campus community around NLP by using a two-pronged approach that includes training and research incubation. For training, MIDAS will organize a series of workshops designed to introduce NLP to domain researchers who have little or no prior experience. We will also develop a learning roadmap that provides researchers with a systematic approach to learning NLP. To enable innovative research, MIDAS is planning research incubation sessions to connect NLP methodology experts and domain researchers who would like to pose significant research questions that could benefit from NLP.
Who Will Benefit: Any researcher who wants to learn the fundamentals of NLP, researchers who have text data and want to connect with NLP experts, and NLP experts who want to seek domain collaborators and ways to apply NLP to significant research questions.
Coordinators: Danai Koutra (Associate Director, MIDAS | Associate Professor, Computer Science and Engineering), Josh Pasek (Associate Director, MIDAS | Associate Professor, Communications and Media), Shane Redman (Senior Scientist, MIDAS)
Access to Twitter Data
Overview: MIDAS provides the U-M research community access to a large repository of raw Twitter data free of charge. Falling under the data pillar, The Twitter Decahose is an ongoing collection of a 10% sample of tweets with data going as far back as 2009. There are also subsets for specific purposes related to COVID-19 and US and Indian politicians.
Who Will Benefit: This is a shared effort between MIDAS, CSCAR, and ARC to provide data to researchers for a broad spectrum of applications in network analysis, behavior analysis, sociolinguistics, natural language processing, and information diffusion.
Coordinator: Sean Meyer (Senior Scientist, MIDAS)