CoreLogic

CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.

These records are publicly available and gathered from county record offices across the country. Coverage dates vary by county, some county records go back 50 years. Coverage is more comprehensive from the 1990s to the present.  The Tax data file contains only one year of data (for most counties that is  2016)

The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.

To access this data, please visit the link below:

Visit Page

If you have any questions about the datasets, please contact a librarian.

Health Insurance Datasets

The Institute for Healthcare Policy and Innovation (IHPI) has more than 20 terabytes of data, from more than 113 million Americans, for researchers to study how healthcare works and how to make it better. IHPI’s data is provided primarily by large insurance companies in the form of administrative claims. These are proprietary datasets that cover both the commercial and private payer insurance sectors, and also give researchers a longitudinal accounting of millions of US patient’s healthcare utilization patterns.

Visit Page

For questions, please email ihpi-data@umich.edu.

GAIA Dataset

The large amount of transportation data Didi Chuxing has collected can be used to understand urban traffic conditions and predict the future of urban transportation. Didi Chuxing strives to improve the efficiency of the urban transportation network by providing some of the anonymized data with the academic community.

Visit Page

Lyft Dataset

To download Lyft’s comprehensive, large-scale dataset featuring raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a bounded geographic area

Click Here

This dataset also includes high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.

Mcity Data Garage

Data Garage is an Mcity maintained dataset catalog. We aggregate meta-data about datasets that are produced by Mcity, Mcity members, UM Faculty/Staff, and the industry for search into Data Garage.  U-M credentials are required to access the datasets.

Visit Page

Waymo Open Dataset

The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. The company is releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology.

Visit Page

Waze Dataset

Waze for Cities Data is available to Waze’s data-sharing partners around the world and includes access to valuable traffic data.

Visit Page

ITS Teaching & Learning datasets contain student learning data from our Canvas Learning Management System as well as student demographic data from the U-M Student Information System.

Unizin Data Warehouse Dataset

The UDW dataset is comprised of teaching and learning data created through the use of the Canvas LMS by U-M  faculty, staff, and students from 2014 until the present.  Researchers can use this data to help answer questions on student learning and learning outcomes.  Administrators can use the data to track program outcomes. For teaching faculty, this data is useful in providing insights into teaching methodologies and instructional resource utilization.

More Information

Unizin Data Platform Dataset

The UDP dataset includes most of the Canvas data found in the Unizin Data Warehouse but is paired with extensive student demographic data from the UM Student Information System.    The UDP is now available in production however additional Canvas and demographics data are still being added to the schema.

More Information

Michigan Genomics Initiative (MGI)

The Michigan Genomics Initiative is a collaborative research effort among physicians and researchers at the University of Michigan with the goal of harmonizing patient electronic medical records with genetic data to gain novel biomedical insights.

Visit Page

Twitter Decahose Dataset

University of Michigan researchers can access a compilation of tweets known as the “Decahose” (a 10% sample of all tweets) without charge.  MIDAS, CSCAR and ARC-TS together manage and support the use of this data repository, including the historical archive of Decahose tweets and ongoing collection from the Decahose.

U-M researchers can use this set of data for five areas of research: information diffusion; Natural Language Processing; network analysis; behavior analysis; Sociolinguistics.

Send Inquiry

For questions, please email Kristin Burgard, MIDAS Outreach and Partnership Manager, burgardk@umich.edu.