Twitter Decahose Dataset

University of Michigan researchers can access a compilation of tweets known as the “Decahose” (a 10% sample of all tweets) without charge.  MIDAS, CSCAR and ARC-TS together manage and support the use of this data repository, including the historical archive of Decahose tweets and ongoing collection from the Decahose.
U-M researchers can use this set of data for five areas of research: information diffusion; Natural Language Processing; network analysis; behavior analysis; Sociolinguistics.
To inquire, please use the following form:

Request Form

If you have any questions about accessing the dataset, please email midas-research@umich.edu.

MIDAS procedures:

  1. Applicants submit a short application form indicating their interest.

  2. MIDAS will send them an MOU, including the types of research allowed, and the major components of the terms of use from Twitter’s Master Agreement.

  3. The applicants will then submit their research plan (a page or so) to MIDAS for approval.

  4. Once approved, the applicants will obtain and show other necessary approvals (such as IRB).

  5. MIDAS will then inform ARC-TS and CSCAR of its approval and the applicants will gain access to the data.

The Master Agreement between Twitter and U-M allows for the use of this dataset for five following areas of research, without having to seek approval from Twitter.  

  1. Information diffusion

  2. Natural Language Processing

  3. Network analysis

  4. Behavior analysis

  5. Sociolinguistics

If you need help working with large datasets or need statistical consulting please make an appointment with CSCAR’s Consultants.

Mcity Datasets

Mcity is a public-private partnership at U-M working to transform global mobility by dramatically improving transportation safety, sustainability, and accessibility. Many research datasets from Mcity sponsored projects, in addition to additional public mobility related datasets, are documented, aggregated, and made available to the campus community through Mcity’s Data Garage.

Mcity’s datasets cover subject areas like vehicle telemetry, naturalistic driving data, raw sensor data, simulation environments, labeled image datasets, human/vehicle interaction, and connected vehicle/infrastructure systems.

Register for an account using the link below:

Register Here

If you have any questions about Mcity or how to access the datasets, please contact tworman@umich.edu.

CoreLogic

CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.

These records are publicly available and gathered from county record offices across the country. Coverage dates vary by county, some county records go back 50 years. Coverage is more comprehensive from the 1990s to the present.  The Tax data file contains only one year of data (for most counties that is  2016)

The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.

To access this data, please visit the link below:

Request Form

If you have any questions about the datasets, please contact a librarian.