Give

Reproducibility Resources

Example: embedding the computational pipeline in the publication (Unlisted)

One approach for embedding the complete computational pipeline (in Github) in a publication.

Assessing the reproducibility of high-throughput experiments with a Bayesian hierarchical model

A Bayesian hierarchical model framework and a set of computational toolkits to evaluate the overall reproducibility of high-throughput biological experiments, and identify irreproducible and reproducible signals via rigorous false discovery rate control procedures.

Replicating predictive models in scale for research on MOOC

A program of reproducibility research in the domain of learning analytics with a specific focus on predictive models of student success. It includes an open-source software infrastructure, the MOOC Replication Framework (MORF), which can be used to reproduce results on larger scale datasets.  It also includes a report on reproductions of work from other scholars ...

Example: Unifying initial conditions of galaxy formation simulation for research replication

This project demonstrates the importance of controlling the initial condition of a numerical simulation of galaxy formation to allow the replication of research findings. Different groups use different initial conditions as the starting point for their numerical modeling, complicating the comparison of results between groups. Are discrepant predictions for galaxy properties due to choices in ...

BioContainers: an open-source and community-driven framework for software standardization

The BioContainers initiative, a free and open-source, community-driven project dedicated to help life science researchers and data analysts to improve software standardization and reproducibility. It facilitates the requests and maintenance of bioinformatics containers, and the interaction between the users and the community. The project is based on light-weight software containers technology such as Docker, and ...

Codifying tacit knowledge in functions using R

Data collection efforts often come with a lot of documentation, of varying degrees of completeness.  Having large amounts of documentation or incomplete documentation can make it hard for analysts to use the data correctly.  In this presentation, Dr. Fisher describes an approach that data distributors can use to make it easier for analysts to use ...

Complete reproduction of a study through the use of Github, Docker and R package

A multi-pronged approach to make code and data easy to access, make the entire analysis available, ensure the computational environment is archived, and make the code useful to a wide audience.  The tools include making all code available on Github; creating a fully documented R package on CRAN to allow the primary algorithms from the ...

Transparent, reproducible and extensible data generation and analysis for materials simulations

Our approach for community software development and an introduction to data and workflow management tools. Reproducible workflows are achieved with the open-source signac software framework for managing large and heterogeneous data spaces (signac itself is agnostic to the data source). The generation of simulation data is performed with the HOOMD-Blue simulation package, and analysis is ...

Principles and tools for developing standardized and interoperable ontologies

In the informatics field, a formal ontology is a human- and computer-interpretable set of terms and relations that represent entities in a specific domain and how they relate to each other. Ontology provides a standardization and computer interpretability of the data, metadata, and knowledge. Hundreds of ontologies have been reported and widely used to support ...