- COVID-19
- Health Insurance
- ITS Teaching & Learning Data
- Michigan Genomics Initiative
- Real Estate Data
- Transportation Data
- Twitter Decahose
Health Insurance Datasets
The Institute for Healthcare Policy and Innovation (IHPI) has more than 20 terabytes of data, from more than 113 million Americans, for researchers to study how healthcare works and how to make it better. IHPI’s data is provided primarily by large insurance companies in the form of administrative claims. These are proprietary datasets that cover both the commercial and private payer insurance sectors, and also give researchers a longitudinal accounting of millions of US patient’s healthcare utilization patterns.
For questions, please email ihpi-data@umich.edu.
ITS Teaching & Learning datasets contain student learning data from our Canvas Learning Management System as well as student demographic data from the U-M Student Information System.
Unizin Data Warehouse Dataset
The UDW dataset is comprised of teaching and learning data created through the use of the Canvas LMS by U-M faculty, staff, and students from 2014 until the present. Researchers can use this data to help answer questions on student learning and learning outcomes. Administrators can use the data to track program outcomes. For teaching faculty, this data is useful in providing insights into teaching methodologies and instructional resource utilization.
Unizin Data Platform Dataset
The UDP dataset includes most of the Canvas data found in the Unizin Data Warehouse but is paired with extensive student demographic data from the UM Student Information System. The UDP is now available in production however additional Canvas and demographics data are still being added to the schema.
Michigan Genomics Initiative (MGI)
The Michigan Genomics Initiative is a collaborative research effort among physicians and researchers at the University of Michigan with the goal of harmonizing patient electronic medical records with genetic data to gain novel biomedical insights.
CoreLogic
CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.
These records are publicly available and gathered from county record offices across the country. Coverage dates vary by county, some county records go back 50 years. Coverage is more comprehensive from the 1990s to the present. The Tax data file contains only one year of data (for most counties that is 2016)
The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.
To access this data, please visit the link below:
If you have any questions about the datasets, please contact a librarian.
Lyft Dataset
To download Lyft’s comprehensive, large-scale dataset featuring raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a bounded geographic area
This dataset also includes high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.
Mcity Data Garage
Data Garage is an Mcity maintained dataset catalog. We aggregate meta-data about datasets that are produced by Mcity, Mcity members, UM Faculty/Staff, and the industry for search into Data Garage. U-M credentials are required to access the datasets.
Waymo Open Dataset
The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. The company is releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology.
Waze Dataset
Waze for Cities Data is available to Waze’s data-sharing partners around the world and includes access to valuable traffic data.
Twitter Decahose Dataset
University of Michigan researchers can access a compilation of tweets known as the “Decahose” (a 10% sample of all tweets) without charge. MIDAS, CSCAR and ARC together manage and support the use of this data repository, including the historical archive of Decahose tweets and ongoing collection from the Decahose.
U-M researchers can use this set of data for five areas of research: information diffusion; Natural Language Processing; network analysis; behavior analysis; Sociolinguistics.
For questions, please email Kristin Burgard, MIDAS Outreach and Partnership Manager, burgardk@umich.edu.