The Twitter Enahose is a compilation of a 1% daily sample of tweets dating back to August 2021. This dataset can be requested and accessed by any U-M affiliate (faculty, staff, student). It is stored on the turbo drive and can be accessed using the Great Lakes High-Performance Computing environment at UM. Currently, accessing the data outside of UM resources is not possible, and moving it offsite is strictly forbidden. MIDAS and ARC together manage and support the use of this data repository. These data are stored in the same format as the Twitter Decahose but at a lower volume of data collection. This makes the Enahose a good candidate dataset for course projects or exploratory research in use cases where Decahose use would be ineligible.


Enahose Data Dictionary

Great Lakes Course Credits – Instructors can request free compute credits and cloud storage for student course projects

Coderspaces Office Hours – Free analytical consulting

HPC Training Videos – Training videos on how to use Great Lakes Platform as well as other resources

Decahose with Great Lakes (Github) – Tutorial for using Twitter Decahose data with PySpark

Decahose Filter (Github) – Tutorial for using command line interface with batch jobs

Request Access