This collection includes datasets that MIDAS manages for the campus, and other U-M and external datasets that are of interest to the MIDAS research community. If you have questions, or if you have datasets that you would like to share with the data science community, please email midas-research@umich.edu

Dataset Filter

Categories

Academic Data Science Alliance

COVID-19

The Academic Data Science Alliance is working with partners to pull together data and data science resources related to the COVID-19 pandemic. This is a living list of resources and we welcome additions, suggestions, and collaborations. Please send additions, corrections, comments, and suggestions to us using this feedback form.

City of Detroit Open Data Portal

Social Science

In March 2021, the City of Detroit launched new tools and datasets to help researchers work with data about addresses in the city. This launch was part of the Base Units project which will make it easier to connect and analyze the datasets available on the City of Detroit’s Open Data Portal. Links to the Base Units Hub Site and the Base Unit Tools are now featured on the front page of the Open Data Portal.

CoreLogic

Social Science

CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.

The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.

If you have any questions about the datasets, please contact a librarian.

CrowdTangle

Social Science

CrowdTangle is a public insights tool from Facebook that makes it easy to follow, analyze, and report on what’s happening across social media. CrowdTangle started a pilot program in 2019 to partner with researchers and academics and help them study critical topics such as racial justice, misinformation, and elections. In addition to launching an online application, we’ve built a new hub with information about all Facebook data sets that are available for independent research.

Facebook COVID-19 Symptom Surveys

COVID-19

Offering the symptom survey datasets to academic and nonprofit researchers with a privacy-minded approach enables experts to generate more impactful insights to aid public health responses. Facebook and partner universities created a centralized webpage for researchers with more information about the symptom surveys and how they can use the data for their research.

Facebook Data for Good

COVID-19

Facebook Data for Good has a number of tools and initiatives that can help organizations respond to the COVID-19 pandemic.

GAIA Dataset

Transportation

Didi Chuxing provides some of their anonymized data with the academic community. The open datasets include trajectory data, large-scale driving video data, and traffic travel index data.

Health Insurance Dataset

Healthcare

The Institute for Healthcare Policy and Innovation (IHPI) has more than 20 terabytes of data, from more than 113 million Americans, for researchers to study how healthcare works and how to make it better. IHPI’s data is provided primarily by large insurance companies in the form of administrative claims. These are proprietary datasets that cover both the commercial and private payer insurance sectors, and also give researchers a longitudinal accounting of millions of US patient’s healthcare utilization patterns.
For questions, please email ihpi-data@umich.edu.

ICPSR COVID-19 Data Repository

COVID-19

ICPSR has created a new archive for data examining the social, behavioral, public health, and economic impact of the novel coronavirus global pandemic. The COVID-19 Data Repository is a free, self-publishing option for any researcher or journalist who wants to share data related to COVID-19. The data will be available to any interested user for secondary analysis.

Lyft Dataset

Transportation

The open datasets include: 1) The logs of movement of traffic agents—cars, cyclists, and pedestrians—that their autonomous fleet encountered on Palo Alto routes. 2) Raw sensor camera and LiDAR inputs as perceived by autonomous vehicles in a bounded geographic area.

 

MCity Data Garage

Transportation

Data Garage is an Mcity maintained dataset catalog. The data is primarily vehicle-level sensor (LIDAR and camera) data that includes multiple geographical areas, high-volume road user intersections, and multiple weather conditions. U-M credentials are required to access the datasets.

Precision Health Analytics Platform

Healthcare

A collaborative research effort among physicians and researchers at the University of Michigan with the goal of harmonizing patient electronic medical records with genetic data to gain novel biomedical insights.

Twitter Decahose

Social Science

Ten percent sample of tweets.

U-M COVID-CORE Twitter Dataset

COVID-19

The COVID-CORE dataset, one of the COVID-19 Social Media datasets created by the U-M School of Information (UMSI) and MIDAS, contains a sequential sample of Tweets that have explicitly mentioned various synonyms, aliases, or hashtags of the COVID-19 disease, the SARS-CoV-2 virus, or the pandemic.

U.S. and India Politicians Dataset

Social Science

Politics in the U.S. and India

For U-M Faculty, Staff, and Students

UNIZIN Data Platform Dataset

Education

The UDP dataset includes most of the Canvas data found in the Unizin Data Warehouse but is paired with extensive student demographic data from the UM Student Information System. The UDP is now available in production however additional Canvas and demographics data are still being added to the schema.

UNIZIN Data Warehouse Dataset

Education

The UDW dataset is comprised of teaching and learning data created through the use of the Canvas LMS by U-M  faculty, staff, and students from 2014 until the present.  Researchers can use this data to help answer questions on student learning and learning outcomes.  Administrators can use the data to track program outcomes. For teaching faculty, this data is useful in providing insights into teaching methodologies and instructional resource utilization.

Waymo Open Dataset

Transportation

The Waymo Open Dataset is comprised of high resolution sensor (LIDAR and camera) data collected by Waymo self-driving cars in a wide variety of conditions. The company is releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology.

Waze For Cities Dataset

Transportation

Waze for Cities Data includes access to their anonymized user data and traffic data.