Show simple item record

dc.contributor.authorChen, Christopher
dc.date.accessioned2019-03-26T11:07:53Z
dc.date.created2018-05
dc.date.issued2018-06-29
dc.date.submitted2018
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:38811561*
dc.description.abstractReproducibility is the cornerstone of science, and we are in the midst of a reproducibility crisis. Simply sharing the code and data used for obtaining results is o en insu cient for reproducibility; in fact, we show that 85.6% of the thousands of R programs published on Dataverse 1 since 2015 cannot be run. Moreover, our nding that the failure rate of these published R programs holds constant regardless of their age implies that errors are caused by code incorrectness, not age-related incompatibility. We contribute to the reproducibility of R-based research by building tools to both automatically correct common errors found in published code/data archives and package the archives to guarantee future reproducibility. We motivate developing these tools with analyses showing that only three types of mistakes caused more than 70% of all the errors we observed, and that automatically correcting these mistakes frequently revealed a more fundamental error: many datasets were simply missing the data used for analysis, highlighting the need for a be er system of documenting and including research-code dependencies. We provide an example of such a system by building containR, a web application which combines our automatic error-correcting code and existing dependency detection tools to create easily-executable and platform-agnostic archives of R-based research.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectComputer Science
dc.subjectStatistics
dc.titleCoding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR
dc.typeThesis or Dissertation
dash.depositing.authorChen, Christopher
dc.date.available2019-03-26T11:07:53Z
thesis.degree.date2018
thesis.degree.grantorHarvard College
thesis.degree.levelUndergraduate
thesis.degree.nameAB
dc.type.materialtext
thesis.degree.departmentComputer Science
dash.identifier.vireohttp://etds.lib.harvard.edu/college/admin/view/295
dash.author.emailchris.chen796@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record