Coding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR
Abstract
Reproducibility is the cornerstone of science, and we are in the midst of a reproducibility crisis. Simply sharing the code and data used for obtaining results is o en insu cient for reproducibility; in fact, we show that 85.6% of the thousands of R programs published on Dataverse 1 since 2015 cannot be run. Moreover, our nding that the failure rate of these published R programs holds constant regardless of their age implies that errors are caused by code incorrectness, not age-related incompatibility. We contribute to the reproducibility of R-based research by building tools to both automatically correct common errors found in published code/data archives and package the archives to guarantee future reproducibility. We motivate developing these tools with analyses showing that only three types of mistakes caused more than 70% of all the errors we observed, and that automatically correcting these mistakes frequently revealed a more fundamental error: many datasets were simply missing the data used for analysis, highlighting the need for a be er system of documenting and including research-code dependencies. We provide an example of such a system by building containR, a web application which combines our automatic error-correcting code and existing dependency detection tools to create easily-executable and platform-agnostic archives of R-based research.Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:38811561
Collections
- FAS Theses and Dissertations [5848]
Contact administrator regarding this item (to report mistakes or request changes)