2020 REPRODUCIBILITY CHALLENGE
A significant challenge across scientific fields is the reproducibility of research results, and third-party assessment of such reproducibility. Ensuring that results can be reliably reproduced is no small task: computational environments may vary drastically and can change over time, rendering code unable to run; specialized workflows might require specialized infrastructure not easily available; sensitive projects might involve data that cannot be directly shared; the robustness of algorithmic decisions and parameter selections varies widely; data collection methods may include crucial steps (e.g. wrangling, cleaning, missingness mitigation strategies, preprocessing) where choices are made but not well-documented. Yet a cornerstone of science remains the ability to verify and validate research findings, so it is important to find ways to overcome these challenges.
The Michigan Institute for Data Science (MIDAS) is pleased to announce the 2020 Reproducibility Challenge. Our goal is to highlight high-quality, reproducible work at the University of Michigan by collecting examples of best practices across diverse fields. Besides incentivizing reproducible workflows and enabling a deeper understanding of issues of reproducibility, we hope the results of the challenge will provide templates that others can follow if they wish to adopt more reproducible approaches to disseminating their work.
The MIDAS Reproducibility Challenge is open to researchers from any field that make use of data, broadly construed. We seek projects and corresponding artifacts and/or publications that contribute to reproducibility in a noteworthy manner. Some examples could include:
- An illustration of a definition of reproducibility for at least one application of data science;
- Metadata with sufficient transparency to allow full understanding of how the data collection, processing and computational workflows or code resulted in a study’s findings;
- An analysis workflow that can be reproduced by others, even with different hardware or software;
- A thorough description of key assumptions, parameter and algorithmic choices in the experimental or computational methods, so that others can test the robustness and generalizability of such choices.
- Procedures or tools that other researchers can adopt to improve data transparency, analysis workflow, and to test the sensitivity of research findings to variations in data and in human decisions.
There will be a prize pool of up to $15,000 cash award for the winning teams. Depending on the submissions, the entire amount may be awarded to a single winning team. Alternatively, we may award prizes to winners in multiple categories, as described above.
In addition, teams with effective approaches for reproducibility (as reflected in the submissions) will get preferential consideration for the next round of Propelling Original Data Science (PODS) grants in the fall of 2020.
All selected projects will be collected on a public webpage highlighting reproducible work at Michigan and providing best practice examples for other researchers.
- Submissions due by 11:59 pm, March 15, 2020.
- Winners will be announced on Sept. 14, 2020.
- Reproducibility Day: 2 pm – 5 pm, Sept. 14, 2020, Virtual
Submissions will be judged based on the following factors:
- The clarity and thoroughness of the report;
- Its potential as an example for others to follow;
- The ease and accuracy with which the results described in the report could be reproduced;
- The broader impact of the work towards addressing reproducibility challenges .
Work that attempts to overcome significant barriers to reproducibility (such as proprietary data or “black box” analytics) will be recognized, even if it represents an imperfect solution.
- Jake Carlson: Manager, Deep Blue Repositories and Research Data Services, U-M Libraries
- H.V. Jagadish: Director, MIDAS, and Professor, Computer Science and Engineering, CoE
- Matthew Kay: Assistant Professor, School of Information
- Jing Liu: Managing Director, MIDAS
- Josh Pasek: Assistant Professor, Communication and Media, LSA
- Brian Puchala: Assistant Research Scientist, Materials Science and Engineering, CoE
- Arvind Rao: Associate Professor, Computational Medicine and Bioinformatics, and Radiation Oncology, Med. School
The following are samples of tools and resources that may be of assistance:
- The Open Science Framework and the Center for Open Science
- The Dataverse
- Google Colab
- Archiving Code with Zenodo on Github
- The CRAN Time Machine for R Reproducibility
- RStudio Cloud
- Pipeline workflow environment
All questions should be sent to: firstname.lastname@example.org.