Give

Generalizable Tools

Replicating predictive models in scale for research on MOOC

A program of reproducibility research in the domain of learning analytics with a specific focus on predictive models of student success. It includes an open-source software infrastructure, the MOOC Replication Framework (MORF), which can be used to reproduce results on larger scale datasets.  It also includes a report on reproductions of work from other scholars ...

Codifying tacit knowledge in functions using R

Data collection efforts often come with a lot of documentation, of varying degrees of completeness.  Having large amounts of documentation or incomplete documentation can make it hard for analysts to use the data correctly.  In this presentation, Dr. Fisher describes an approach that data distributors can use to make it easier for analysts to use ...

Complete reproduction of a study through the use of Github, Docker and R package

A multi-pronged approach to make code and data easy to access, make the entire analysis available, ensure the computational environment is archived, and make the code useful to a wide audience.  The tools include making all code available on Github; creating a fully documented R package on CRAN to allow the primary algorithms from the ...

Transparent, reproducible and extensible data generation and analysis for materials simulations

Our approach for community software development and an introduction to data and workflow management tools. Reproducible workflows are achieved with the open-source signac software framework for managing large and heterogeneous data spaces (signac itself is agnostic to the data source). The generation of simulation data is performed with the HOOMD-Blue simulation package, and analysis is ...

Principles and tools for developing standardized and interoperable ontologies

In the informatics field, a formal ontology is a human- and computer-interpretable set of terms and relations that represent entities in a specific domain and how they relate to each other. Ontology provides a standardization and computer interpretability of the data, metadata, and knowledge. Hundreds of ontologies have been reported and widely used to support ...

Multi-informatic Cellular Visualization

MiCV, a Multi-informatic Cellular Visualization tool that provides a uniform web interface to a set of essential analytical tools for high-dimensional datasets. Biologists looking to scRNA-seq as a high-throughput exploratory research method are often bogged down by the multitude of bioinformatics tools and pipelines available today. MiCV provides a point-and-click interface for a variety of tools that make up ...

Large-Scale, Reproducible Implementation and Evaluation of Heuristics for Optimization Problems

Research developing new heuristics for optimization problems is often not reproducible; for instance, only 4% of papers for two famous optimization problems published their source code. This limits the impact of the research both within the heuristics community and more broadly among practitioners. In this work, the authors built a large-scale open-source code-base of heuristics. ...

Automatic capture of data transformations to improve metadata

The C2Metadata Project has created software that automatically captures data transformations in common statistical packages in a simple yet expressive representation regardless of the original languages. It encourages data sharing and re-use by reducing the cost of documenting data management and preparation programs. The system first translates statistical transformation scripts (in SPSS, Stata, SAS, R, ...

Rigorous code review for better code release

A systematic approach to code review and code release.  For code review, this team conducted a “blind” experiment, in which a data analyst had to re-create the study design based solely on the code that was made available to them.  For code release, this team shares their platform and their considerations on how to make ...