Give

Fully Reproducible Projects

American Economic Association (AEA) Data & Code Repository at openICPSR

The American Economic Association (AEA) shares replication packages (data and code) through a newly established AEA Data and Code Repository at the Inter-university Consortium for Political and Social Research (ICPSR). In 2019, the AEA adopted a revised Data and Code Availability Policy “to improve the reproducibility and transparency of materials supporting research published in the ...

Multi-informatic Cellular Visualization

MiCV, a Multi-informatic Cellular Visualization tool that provides a uniform web interface to a set of essential analytical tools for high-dimensional datasets. Biologists looking to scRNA-seq as a high-throughput exploratory research method are often bogged down by the multitude of bioinformatics tools and pipelines available today. MiCV provides a point-and-click interface for a variety of tools that make up ...

Example: Complete documentation and sharing of data and analysis with the example of a micro-randomized trial

An example of pre-registration of study protocols and open source documents and code to clearly describe key assumptions and decisions made for data curation and analysis of a micro-randomized trial. The documentation also includes sensitivity analyses showing how the results change under alternative decisions.  The workflow provides a template for other scholars to use.

Automatic capture of data transformations to improve metadata

The C2Metadata Project has created software that automatically captures data transformations in common statistical packages in a simple yet expressive representation regardless of the original languages. It encourages data sharing and re-use by reducing the cost of documenting data management and preparation programs. The system first translates statistical transformation scripts (in SPSS, Stata, SAS, R, ...

Rigorous code review for better code release

A systematic approach to code review and code release.  For code review, this team conducted a “blind” experiment, in which a data analyst had to re-create the study design based solely on the code that was made available to them.  For code release, this team shares their platform and their considerations on how to make ...

Example: Effective communication for reproducible research

This example highlights the importance of open communication and team work for reproducible research, both for making one’s work reproducible by others, and for reproducing people’s work.  A few aspects of code sharing are emphasized in this example: whether the code is accessible, whether it is thoroughly and clearly documented and whether it is generalizable.

Codifying tacit knowledge in functions using R

Data collection efforts often come with a lot of documentation, of varying degrees of completeness.  Having large amounts of documentation or incomplete documentation can make it hard for analysts to use the data correctly.  In this presentation, Dr. Fisher describes an approach that data distributors can use to make it easier for analysts to use ...

Complete reproduction of a study through the use of Github, Docker and R package

A multi-pronged approach to make code and data easy to access, make the entire analysis available, ensure the computational environment is archived, and make the code useful to a wide audience.  The tools include making all code available on Github; creating a fully documented R package on CRAN to allow the primary algorithms from the ...

Transparent, reproducible and extensible data generation and analysis for materials simulations

Our approach for community software development and an introduction to data and workflow management tools. Reproducible workflows are achieved with the open-source signac software framework for managing large and heterogeneous data spaces (signac itself is agnostic to the data source). The generation of simulation data is performed with the HOOMD-Blue simulation package, and analysis is ...