Data Pillar: Measuring and Improving Society

Societal transformations have complicated traditional survey methods for data collection, while a plethora of new data sources are creating opportunities to measure human behaviors and the human social condition. MIDAS supports the development and use of data science and AI methods to better understand society through new data types such as text, video, sensor and digital trace data.

Want to participate in pillar activities or have questions? Please email

Natural Language Processing (NLP) / Text as Data

Overview: MIDAS is working on strengthening the campus community around NLP by using a two-pronged approach that includes training and research incubation.  For training, MIDAS organizes a series of tutorials designed to introduce NLP to domain researchers who have little or no prior experience.  We will also develop a learning roadmap that provides researchers with a systematic approach to learning NLP.  To enable innovative research, MIDAS is also running research connections meetings to connect NLP methodology experts and domain researchers who pose significant research questions that could benefit from NLP.

Who Will Benefit: Any researcher who wants to learn the fundamentals of NLP, researchers who have text data and want to connect with NLP experts, and NLP experts who want to seek domain collaborators and ways to apply NLP to significant research questions.

Coordinators: Danai Koutra (Associate Director, MIDAS | Associate Professor, Computer Science and Engineering), Josh Pasek  (Associate Director, MIDAS | Associate Professor, Communications and Media), Elyse Thulin (Michigan Data Science Fellow)

Access to Twitter Data

Overview: MIDAS provides the U-M research community access to a large repository of raw Twitter data free of charge. Falling under the data pillar, The Twitter Decahose is an ongoing collection of a 10% sample of tweets with data going as far back as 2009. There are also subsets for specific purposes related to COVID-19 and US and Indian politicians.

Who Will Benefit: This is a shared effort between MIDAS, CSCAR, and ARC to provide data to researchers for a broad spectrum of applications in network analysis, behavior analysis, sociolinguistics, natural language processing, and information diffusion.

Coordinator: Sean Meyer (Senior Scientist, MIDAS)

Supporting the development of new data and their access

Overview: MIDAS supports and collaborates with our faculty and campus units who develop new data sources and data infrastructure for social science research, and enable the wide adoption of such new resources. To name just a few examples: 1) The Research Data Ecosystem, a major effort of ICPSR with NSF funding to modernize the existing software platform to increase the ability of researchers to safely and securely access, connect, store, and manipulate data. MIDAS is a collaborator for the grant application and the project implementations. 2) MIDAS provided initial funding and data access for Libby Hemphill (faculty member in the School of Information and ICPSR) to develop the Social Media Archive. 3) MIDAS pilot funding for faculty effort to develop new data for social science (such as a large scale data on romantic relationships and digitizing the G.I. Bill record data).

Who Will Benefit: Researchers and units that develop new data sources and infrastructure.

Coordinator: Jing Liu (Managing Director, MIDAS)