Dataset Filter

Categories

Academic Data Science Alliance

COVID-19

The Academic Data Science Alliance is working with partners to pull together data and data science resources related to the COVID-19 pandemic. This is a living list of resources and we welcome additions, suggestions, and collaborations. Please send additions, corrections, comments, and suggestions to us using this feedback form.

CoreLogic

Social Science

CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.

The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.

If you have any questions about the datasets, please contact a librarian.

CrowdTangle

Social Science

CrowdTangle is a public insights tool from Facebook that makes it easy to follow, analyze, and report on what’s happening across social media. CrowdTangle started a pilot program in 2019 to partner with researchers and academics and help them study critical topics such as racial justice, misinformation, and elections. In addition to launching an online application, we’ve built a new hub with information about all Facebook data sets that are available for independent research.

Facebook COVID-19 Symptom Surveys

COVID-19

Offering the symptom survey datasets to academic and nonprofit researchers with a privacy-minded approach enables experts to generate more impactful insights to aid public health responses. Facebook and partner universities created a centralized webpage for researchers with more information about the symptom surveys and how they can use the data for their research.

Facebook Data for Good

COVID-19

Facebook Data for Good has a number of tools and initiatives that can help organizations respond to the COVID-19 pandemic.

GAIA Dataset

Transportation

Didi Chuxing provides some of their anonymized data with the academic community. The open datasets include trajectory data, large-scale driving video data, and traffic travel index data.

Health Insurance Dataset

Healthcare

The Institute for Healthcare Policy and Innovation (IHPI) has more than 20 terabytes of data, from more than 113 million Americans, for researchers to study how healthcare works and how to make it better. IHPI’s data is provided primarily by large insurance companies in the form of administrative claims. These are proprietary datasets that cover both the commercial and private payer insurance sectors, and also give researchers a longitudinal accounting of millions of US patient’s healthcare utilization patterns.
For questions, please email ihpi-data@umich.edu.

ICPSR COVID-19 Data Repository

COVID-19

ICPSR has created a new archive for data examining the social, behavioral, public health, and economic impact of the novel coronavirus global pandemic. The COVID-19 Data Repository is a free, self-publishing option for any researcher or journalist who wants to share data related to COVID-19. The data will be available to any interested user for secondary analysis.

Lyft Dataset

Transportation

The open datasets include: 1) The logs of movement of traffic agents—cars, cyclists, and pedestrians—that their autonomous fleet encountered on Palo Alto routes. 2) Raw sensor camera and LiDAR inputs as perceived by autonomous vehicles in a bounded geographic area.

 

MCity Data Garage

Transportation

Data Garage is an Mcity maintained dataset catalog. The data is primarily vehicle-level sensor (LIDAR and camera) data that includes multiple geographical areas, high-volume road user intersections, and multiple weather conditions. U-M credentials are required to access the datasets.

Precision Health Analytics Platform

Healthcare

A collaborative research effort among physicians and researchers at the University of Michigan with the goal of harmonizing patient electronic medical records with genetic data to gain novel biomedical insights.

Twitter Decahose

Social Science

Ten percent sample of tweets.

U-M COVID-CORE Twitter Dataset

COVID-19

The COVID-CORE dataset, one of the COVID-19 Social Media datasets created by the U-M School of Information (UMSI) and MIDAS, contains a sequential sample of Tweets that have explicitly mentioned various synonyms, aliases, or hashtags of the COVID-19 disease, the SARS-CoV-2 virus, or the pandemic. The team curated a list of keywords to generate filtering queries. By applying these queries to the Decahose stream (~10% sequential sample) of Tweets, we are able to retrieve millions of Tweets per month. The extracted Tweets start January 1, 2020. COVID-19 datasets filtered for medical/health and social/economic impact will be available soon. Please contact the creators, Dr. Xuan Lu (luxuan@umich.edu) and Dr. Qiaozhu Mei (qmei@umich.edu) for technical questions or for special needs of extraction.  If you currently have access to the Twitter decahose, contact Kristin Burgard (burgardk@umich.edu) to access the COVID-CORE. If you do not already have access to the Twitter decahose, you will need to first request access.

UNIZIN Data Platform Dataset

Education

The UDP dataset includes most of the Canvas data found in the Unizin Data Warehouse but is paired with extensive student demographic data from the UM Student Information System. The UDP is now available in production however additional Canvas and demographics data are still being added to the schema.

UNIZIN Data Warehouse Dataset

Education

The UDW dataset is comprised of teaching and learning data created through the use of the Canvas LMS by U-M  faculty, staff, and students from 2014 until the present.  Researchers can use this data to help answer questions on student learning and learning outcomes.  Administrators can use the data to track program outcomes. For teaching faculty, this data is useful in providing insights into teaching methodologies and instructional resource utilization.

Waymo Open Dataset

Transportation

The Waymo Open Dataset is comprised of high resolution sensor (LIDAR and camera) data collected by Waymo self-driving cars in a wide variety of conditions. The company is releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology.

Waze For Cities Dataset

Transportation

Waze for Cities Data includes access to their anonymized user data and traffic data.