MIDAS Reproducibility Challenge Showcase: Jared Lyle & Jacob Fisher

June 9, 2020 1:00 PM - 2:00 PM

View Event Recording

Jared Lyle | Jacob Fisher

Archivist, ICPSR, University of Michigan | Research Investigator, Survey Research Center, University of Michigan

View Full Recording

View Jared Lyle Recording

View Jacob Fisher Recording

Jared Lyle

Co-Presenters: Lars Vilhuber (Executive Director, Labor Dynamics Institue, Cornell University), Maggie Levenstein (Director, ICPSR, University of Michigan), Chelsea Goforth (Data Project Manager, ICPSR, Univerisity of Michigan)

Title: American Economic Association (AEA) Data and Code Repository at open ICPSR

In 2019, the American Economic Association (AEA) adopted a Data and Code Availability Policy “to improve the reproducibility and transparency of materials supporting research published in the AEA journals by providing improved guidance on the types of materials required, increased quality control, and more review earlier in the publication process.”  The AEA initiative is one of the most comprehensive reproducibility and data/code sharing initiatives in the social sciences. In this presentation, we review the AEA workflow, including how the AEA assesses compliance with the policy and the accuracy of the information by running code to reproduce the reported results. We also demonstrate the newly established AEA Data and Code Repository at the Inter-university Consortium for Political and Social Research (ICPSR), which facilitates the AEA’s workflow and review. Each data collection in the repository receives a persistent digital identifier (DOI), as well as descriptive metadata to increase findability, including JEL codes and subject terms. Data collections are also linked back to the journal article. Additionally, the AEA migrated their entire back archive of more than 3,000 data and code supplements to the AEA Data and Code Repository at ICPSR. This represents almost two decades of required data sharing associated with AEA journal publications.

Jacob Fisher

Title: Data-specific functions

In this study, I recommend that data collection efforts distribute an open-source set of tools for working with a particular data set, which I call data-specific functions. The goal of these functions is to codify best practices for working with the data in a set of functions for commonly used statistical software. These functions would be jointly developed by the users and distributers of the data. Building such functions would both shorten the learning curve for new users and improve the quality of the data, by making tacit knowledge about problems with the data explicit and easy to act on.  The presentation will cover some considerations for designing data-specific functions, particularly with respect to survey data collection, and will show an example.

———————————————————————

The Reproducibility Showcase features a series of online presentations and tutorials from May to August, 2020.  Presenters are selected from the MIDAS Reproducibility Challenge 2020.  

A significant challenge across scientific fields is the reproducibility of research results, and third-party assessment of such reproducibility. The goal of the MIDAS Reproducibility Challenge is to highlight high-quality, reproducible work at the University of Michigan by collecting examples of best practices across diverse fields.  We received a large number of entries that illustrate wonderful work in the following areas: 

  1. Theory – A definition of reproducibility and what aspects of reproducibility are critical in a particular domain or in general.
  2. Reproducing a Particular Study – Comprehensive record of parameters and code that allows for others to reproduce the results in a particular project.
  3. Generalizable Tools – A general platform for coding or running analyses that standardizes the methods for reproducible results across studies.
  4. Robustness – Metadata, tools and processes to improve the robustness of results to variations in data, computational hardware and software, and human decisions.
  5. Assessments of Reproducibility – Methods to test the consistency of results from multiple projects, such as meta-analysis or the provision of parameters that can be compared across studies.
  6. Reproducibility under Constraints – Sharing code and/or data to reproduce results without violating privacy or other restrictions.

On Sept. 14, 2020, MIDAS will also host a Reproducibility Day, which is a workshop on concepts and best practices of research reproducibility.  Please save the date on your calendar.