Automatic capture of data transformations to improve metadata

The C2Metadata Project has created software that automatically captures data transformations in common statistical packages in a simple yet expressive representation regardless of the original languages. It encourages data sharing and re-use by reducing the cost of documenting data management and preparation programs. The system first translates statistical transformation scripts (in SPSS, Stata, SAS, R, or Python) into a software-independent data transformation language, and then it updates the original metadata to match the transformed data. The updated metadata can be displayed and queried to track how the dataset has been changed. The current focus is on applications in the social and behavioral sciences and ecological sciences, but this approach is generalizable to other domains that use statistical software for managing data.

