This page features resources submitted by U-M data science researchers. Ensuring reproducible data science is no small task: computational environments may vary drastically and can change over time; specialized workflows might require specialized infrastructure not easily available; sensitive projects might involve restricted data; the robustness of algorithmic decisions and parameter selections varies widely; crucial steps (e.g. wrangling, cleaning, mitigating missing data issues, preprocessing) where choices are made might not be well-documented. Our resource collection will help researchers tackle some of these challenges. If you would like to submit tools, publications and other resources to be included in this page, please email midas-research@umich.edu.
Reproducibility Resources
Definition and quantification of reproducibility
> What can and should be reproduced, and to what extent a result can be reproduced
Comprehensive documentation and sharing of data, code and workflow
> Guidelines and tools for recording and sharing data, code and documentation to reproduce the findings of a project, even with variations in data, computational hardware and software, and statistical and algorithmic decisions
Reproducible research with restricted data
> How to ensure reproducible results when the original data cannot be shared
Replication of published studies and meta-analysis
> Guidelines and tools for documentation, coding, and running analyses that standardizes the methods for reproducible results across studies