University of Michigan researchers can access a compilation of tweets known as the “Decahose” (a 10% sample of all tweets) without charge. MIDAS, CSCAR and ARC together manage and support the use of this data repository, including the historical archive of Decahose tweets and ongoing collection from the Decahose.
U-M researchers can use this set of data for five areas of research: information diffusion; Natural Language Processing; network analysis; behavior analysis; Sociolinguistics. Please note: each project will need to have a faculty PI who is eligible, according to ORSP guidelines, to be a PI on federal grants.
For questions, please email Kristin Burgard, MIDAS Outreach and Partnership Manager, email@example.com.
In addition to the decahose, a COVID-CORE Twitter Dataset is also available. One of the COVID-19 Social Media datasets created by the U-M School of Information (UMSI) and MIDAS, contains a sequential sample of Tweets that have explicitly mentioned various synonyms, aliases, or hashtags of the COVID-19 disease, the SARS-CoV-2 virus, or the pandemic. The team curated a list of keywords, to generate filtering queries. By applying these queries to the Decahose stream, we are able to retrieve millions of Tweets per month. The extracted Tweets start January 1, 2020. COVID-19 datasets filtered for medical/health and social/economic impact will be available soon. Please contact the creators, Dr. Xuan Lu (firstname.lastname@example.org) and Dr. Qiaozhu Mei (email@example.com) for technical questions or for special needs of extraction. If you currently have access to the Twitter decahose, contact Kristin Burgard (firstname.lastname@example.org) to access the COVID-CORE. If you do not already have access to the Twitter decahose, you will need to first request access.