The Twitter Enahose is a compilation of a 1% daily sample of tweets dating back to August 2021. This dataset can be requested and accessed by any U-M affiliate (faculty, staff, student). It is stored on the turbo drive and can be accessed using the Great Lakes High-Performance Computing environment at UM. Currently, accessing the data outside of UM resources is not possible, and moving it offsite is strictly forbidden. MIDAS and ARC together manage and support the use of this data repository. These data are stored in the same format as the Twitter Decahose but at a lower volume of data collection. This makes the Enahose a good candidate dataset for course projects or exploratory research in use cases where Decahose use would be ineligible.
Resources:
Great Lakes Course Credits – Instructors can request free compute credits and cloud storage for student course projects
Coderspaces Office Hours – Free analytical consulting
HPC Training Videos – Training videos on how to use Great Lakes Platform as well as other resources
Decahose with Great Lakes (Github) – Tutorial for using Twitter Decahose data with PySpark
Decahose Filter (Github) – Tutorial for using command line interface with batch jobs