Coding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR
MetadataShow full item record
AbstractReproducibility is the cornerstone of science, and we are in the midst of a reproducibility crisis. Simply sharing the code and data used for obtaining results is o en insu cient for reproducibility; in fact, we show that 85.6% of the thousands of R programs published on Dataverse 1 since 2015 cannot be run. Moreover, our nding that the failure rate of these published R programs holds constant regardless of their age implies that errors are caused by code incorrectness, not age-related incompatibility. We contribute to the reproducibility of R-based research by building tools to both automatically correct common errors found in published code/data archives and package the archives to guarantee future reproducibility. We motivate developing these tools with analyses showing that only three types of mistakes caused more than 70% of all the errors we observed, and that automatically correcting these mistakes frequently revealed a more fundamental error: many datasets were simply missing the data used for analysis, highlighting the need for a be er system of documenting and including research-code dependencies. We provide an example of such a system by building containR, a web application which combines our automatic error-correcting code and existing dependency detection tools to create easily-executable and platform-agnostic archives of R-based research.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:38811561
- FAS Theses and Dissertations