Replicating predictive models in scale for research on MOOC

What you will learn

A program of reproducibility research in the domain of learning analytics with a specific focus on predictive models of student success. It includes an open-source software infrastructure, the MOOC Replication Framework (MORF), which can be used to reproduce results on larger scale datasets.  It also includes a report on reproductions of work from other scholars in the field, which uncovers a lack of generalizability of reported findings and illustrates the nuances of using predictive models in learning analytics.


Christopher Brooks

Josh Gardner

Ryan S. Baker

Replications with Source Code

Fei and Yeung (2015), “Temporal Models for Predicting Student Dropout in Massive Open Online Courses”, and the replication code is on Github and the results were presented in Gardner, Yang, Baker, and Brooks (2018), “Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining.”

Xing, Chen, Stein, and Marcinowski (2016), Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization, and the replication code is on Github and the results were presented in Gardner, Brooks, Andres, and Baker, Replicating MOOC Predictive Models at Scale, Proceedings of the Fifth Annual Meeting of the ACM Conference on Learning@Scale; June 2018; London, UK.

MOOC Replication Framework

The source code and overview of the project are available on github, the preferred citation for the platform is available on arxiv.

Big Data Ignite Talk

Assessments of Reproducibility, Fully Reproducible Projects, Generalizable Tools, Reproducibility under Constraints, Reproducible research with restricted data, Theory and Definition