Automatically Determining Versions of Scholarly Articles

DSpace/Manakin Repository

Automatically Determining Versions of Scholarly Articles

Citable link to this page


Title: Automatically Determining Versions of Scholarly Articles
Author: Rothchild, Daniel Hugo; Shieber, Stuart Merrill ORCID  0000-0002-7733-8195

Note: Order does not necessarily reflect citation order of authors.

Citation: Rothchild, Daniel, and Stuart Shieber. 2017. “Automatically Determining Versions of Scholarly Articles.” Scholarly and Research Communication 8 (1) (March 22). doi:10.22230/src.2017v8n1a268.
Full Text & Related Files:
Abstract: Background: Repositories of scholarly articles should provide authoritative information about the materials they distribute and should distribute those materials in keeping with pertinent laws. To do so, it is important to have accurate information about the versions of articles in a collection.
Analysis: This article presents a simple statistical model to classify articles as author manuscripts or versions of record, with parameters trained on a collection of articles that have been hand-annotated for version. The algorithm achieves about 94 percent accuracy on average (cross-validated).
Conclusion and implications: The average pairwise annotator agreement among a group of experts was 94 percent, showing that the method developed in this article displays performance competitive with human experts.
Published Version: doi:10.22230/src.2017v8n1a268
Terms of Use: This article is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at
Citable link to this page:
Downloads of this work:

Show full Dublin Core record

This item appears in the following Collection(s)


Search DASH

Advanced Search