CoreLogic

CoreLogic aggregates data from individual, parcel-level real estate transactions and financial records We have licensed access to Tax, Deed, and Foreclosure data at the parcel level for every county in the United States.

These records are publicly available and gathered from county record offices across the country. Coverage dates vary by county, some county records go back 50 years. Coverage is more comprehensive from the 1990s to the present.  The Tax data file contains only one year of data (for most counties that is  2016)

The dataset consists of multiple pipe-delimited text files organized into Tax, Deed and Foreclosure. Each file covers the whole US.

To access this data, please visit the link below:

Visit Page

If you have any questions about the datasets, please contact a librarian.

Healthcare Datasets

The Institute for Healthcare Policy and Innovation (IHPI) has more than 20 terabytes of data, from more than 113 million Americans, for researchers to study how healthcare works and how to make it better. IHPI’s data is provided primarily by large insurance companies in the form of administrative claims. These are proprietary datasets that cover both the commercial and private payer insurance sectors, and also give researchers a longitudinal accounting of millions of US patient’s healthcare utilization patterns.

Visit Page

For questions, please email ihpi-data@umich.edu.

To download Lyft’s comprehensive, large-scale dataset featuring raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a bounded geographic area

Click Here

This dataset also includes high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.

Michigan Genomics Initiative (MGI)

The Michigan Genomics Initiative is a collaborative research effort among physicians and researchers at the University of Michigan with the goal of harmonizing patient electronic medical records with genetic data to gain novel biomedical insights.

Visit Page

Twitter Decahose Dataset

University of Michigan researchers can access a compilation of tweets known as the “Decahose” (a 10% sample of all tweets) without charge.  MIDAS, CSCAR and ARC-TS together manage and support the use of this data repository, including the historical archive of Decahose tweets and ongoing collection from the Decahose.

U-M researchers can use this set of data for five areas of research: information diffusion; Natural Language Processing; network analysis; behavior analysis; Sociolinguistics.

Send Inquiry

For questions, please email Kristin Burgard, MIDAS Outreach and Partnership Manager, burgardk@umich.edu.

Waymo Open Dataset

The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. The company is releasing this dataset publicly to aid the research community in making advancements in machine perception and self-driving technology.

Visit Page

Awardees of the Library Data Grants Program can be found here.