This page features resources submitted by U-M data science researchers. Ensuring reproducible data science is no small task: computational environments may vary drastically and can change over time; specialized workflows might require specialized infrastructure not easily available; sensitive projects might involve restricted data; the robustness of algorithmic decisions and parameter selections varies widely; crucial steps (e.g. wrangling, cleaning, mitigating missing data issues, preprocessing) where choices are made might not be well-documented. Our resource collection will help researchers tackle some of these challenges. If you would like to submit tools, publications and other resources to be included in this page, please email firstname.lastname@example.org.
> What can and should be reproduced, and to what extent a result can be reproduced
> Guidelines and tools for recording and sharing data, code and documentation to reproduce the findings of a project, even with variations in data, computational hardware and software, and statistical and algorithmic decisions
> How to ensure reproducible results when the original data cannot be shared
> Guidelines and tools for documentation, coding, and running analyses that standardizes the methods for reproducible results across studies