Publication: Coding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
Reproducibility is the cornerstone of science, and we are in the midst of a reproducibility crisis. Simply sharing the code and data used for obtaining results is o en insu cient for reproducibility; in fact, we show that 85.6% of the thousands of R programs published on Dataverse 1 since 2015 cannot be run. Moreover, our nding that the failure rate of these published R programs holds constant regardless of their age implies that errors are caused by code incorrectness, not age-related incompatibility. We contribute to the reproducibility of R-based research by building tools to both automatically correct common errors found in published code/data archives and package the archives to guarantee future reproducibility. We motivate developing these tools with analyses showing that only three types of mistakes caused more than 70% of all the errors we observed, and that automatically correcting these mistakes frequently revealed a more fundamental error: many datasets were simply missing the data used for analysis, highlighting the need for a be er system of documenting and including research-code dependencies. We provide an example of such a system by building containR, a web application which combines our automatic error-correcting code and existing dependency detection tools to create easily-executable and platform-agnostic archives of R-based research.