Papers and/or Publications
C2Metadata: Automating the Capture of Data Transformations from Statistical Scripts in Data Documentation
Automating the Capture of Data Transformation Metadata from Statistical Analysis Software.
Provenance Metadata for Statistical Data: An Introduction to Structured Data Transformation Language (SDTL).
What you will learn
The C2Metadata Project has created software that automatically captures data transformations in common statistical packages in a simple yet expressive representation regardless of the original languages. It encourages data sharing and re-use by reducing the cost of documenting data management and preparation programs. The system first translates statistical transformation scripts (in SPSS, Stata, SAS, R, or Python) into a software-independent data transformation language, and then it updates the original metadata to match the transformed data. The updated metadata can be displayed and queried to track how the dataset has been changed. The current focus is on applications in the social and behavioral sciences and ecological sciences, but this approach is generalizable to other domains that use statistical software for managing data.