Publication:

Automatically Determining Versions of Scholarly Articles

Loading...
Thumbnail Image

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Canadian Journal of Communication
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Rothchild, Daniel, and Stuart Shieber. 2017. “Automatically Determining Versions of Scholarly Articles.” Scholarly and Research Communication 8 (1) (March 22). doi:10.22230/src.2017v8n1a268.

Abstract

Background: Repositories of scholarly articles should provide authoritative information about the materials they distribute and should distribute those materials in keeping with pertinent laws. To do so, it is important to have accurate information about the versions of articles in a collection. Analysis: This article presents a simple statistical model to classify articles as author manuscripts or versions of record, with parameters trained on a collection of articles that have been hand-annotated for version. The algorithm achieves about 94 percent accuracy on average (cross-validated). Conclusion and implications: The average pairwise annotator agreement among a group of experts was 94 percent, showing that the method developed in this article displays performance competitive with human experts.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories