Coding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR

Chen, Christopher

dc.contributor.author	Chen, Christopher
dc.date.accessioned	2019-03-26T11:07:53Z
dc.date.created	2018-05
dc.date.issued	2018-06-29
dc.date.submitted	2018
dc.identifier.uri	http://nrs.harvard.edu/urn-3:HUL.InstRepos:38811561	*
dc.description.abstract	Reproducibility is the cornerstone of science, and we are in the midst of a reproducibility crisis. Simply sharing the code and data used for obtaining results is o en insu cient for reproducibility; in fact, we show that 85.6% of the thousands of R programs published on Dataverse 1 since 2015 cannot be run. Moreover, our nding that the failure rate of these published R programs holds constant regardless of their age implies that errors are caused by code incorrectness, not age-related incompatibility. We contribute to the reproducibility of R-based research by building tools to both automatically correct common errors found in published code/data archives and package the archives to guarantee future reproducibility. We motivate developing these tools with analyses showing that only three types of mistakes caused more than 70% of all the errors we observed, and that automatically correcting these mistakes frequently revealed a more fundamental error: many datasets were simply missing the data used for analysis, highlighting the need for a be er system of documenting and including research-code dependencies. We provide an example of such a system by building containR, a web application which combines our automatic error-correcting code and existing dependency detection tools to create easily-executable and platform-agnostic archives of R-based research.
dc.format.mimetype	application/pdf
dc.language.iso	en
dash.license	LAA
dc.subject	Computer Science
dc.subject	Statistics
dc.title	Coding Be eR: Assessing and Improving the Reproducibility of R-Based Research With containR
dc.type	Thesis or Dissertation
dash.depositing.author	Chen, Christopher
dc.date.available	2019-03-26T11:07:53Z
thesis.degree.date	2018
thesis.degree.grantor	Harvard College
thesis.degree.level	Undergraduate
thesis.degree.name	AB
dc.type.material	text
thesis.degree.department	Computer Science
dash.identifier.vireo	http://etds.lib.harvard.edu/college/admin/view/295
dash.author.email	chris.chen796@gmail.com

Files in this item

Name:: CHEN-SENIORTHESIS-2018.pdf
Size:: 1.501Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

FAS Theses and Dissertations [6136]

Show simple item record