Archived postprints should identify themselves
SPARC Open Access Newsletter, issue #85
May 2, 2005
by Peter Suber
Some publishers worry that OA archiving tends to erase their brand.  I'd like to suggest that authors, readers, and the friends of OA should worry about this too.

Most postprints in most OA repositories use metadata to identify the journal in which an article was published.  This is very helpful but not sufficient.  While these metadata are readable by special tools, they are not always readable by the human readers who open the files. 

Authors who archive the texts of their published articles (postprints) should identify somewhere on the postprint, in a clear and conspicuous way for human readers, the journal in which the article was published.  This practice would help *all* the stakeholders.  In particular, it would help the following:

(1) Authors.  When an article has been accepted by a peer-reviewed journal, then authors benefit from revealing this fact.  This is elementary:  peer-reviewed articles have more credibility than unrefereed preprints.  That's why authors submit their work to peer-reviewed journals instead of posting them directly online and bypassing peer review.  Readers who are too busy to look at everything on a subject have a justified preference for peer-reviewed literature.  Authors deserve the boost in credibility and the boost in audience that comes from identifying their work as peer-reviewed.  If the journal is esteemed, then identifying it gives the author an extra boost on both fronts.

(2) Readers.  Readers not only prefer refereed articles to unrefereed articles when they are in a hurry.  They also use a journal's name, orientation, and reputation as clues to an article's quality or methodology.  This isn't the place to debate whether these clues support accurate inferences more often than marketing programs and self-fulfilling opinions already circulating within the discipline.  What matters here is that readers would rather have these clues than not have them, and authors benefit by attracting readers.

(3) Journals and publishers.  This is where we started.  Journals and publishers want to preserve their brand.  What they have to offer above everything else is their selectivity, quality, and editorial standards.  Journals work hard on these and deserve to be identified.  Journals that perform copy editing, fact checking, and other services, deserve credit for these forms of added value.

(If you tuned in late, I acknowledge that journals add value.  It's a myth that OA wants to dispense with these valuable services, although sometimes OA journals must choose between the more essential and the less essential services.  The true bone of contention is not whether these services are valuable but how to pay for the most essential services without creating access barriers for readers.)

(4) OA proponents in general.   About 80% of surveyed journals already permit postprint archiving.  We don't really know what's holding back the remaining 20%, and there are bound to be many variables in the mix.  One variable, surely, is branding.  If journals in the 20% knew that postprint archiving would preserve their brand, then they would be more likely to permit postprint archiving (i.e. more likely to shift from gray to green).  Conversely, if postprint archiving routinely preserved the journal's brand, then journals in the 80% would be less likely to rescind their permission (i.e. less likely to shift from green to gray). 

Journals know or should know that OA increases an article's citation count by 50-300%, even after we restrict the comparison to non-OA articles from the same journal and year.  This boost in citation impact benefits journals as much as it benefits authors.  All journals have this interest in encouraging OA archiving, not merely permitting it.  However, non-OA journals have countervailing interests as well, which explains why very few go beyond permission to encouragement.  We don't know how close the balance of pros and cons is at a given journal, but every added benefit can help tilt the balance toward active encouragement.  Preserving the journal's identification can provide this kind of help.

For the evidence that OA increases citation impact, see the studies collected in Steve Hitchcock's bibliography, The effect of open access and downloads ('hits') on citation impact.

(5) All who want to solve the version control problem.  If a work has been accepted by the peer-reviewed Journal of Yada Yada and says so in some evident way, then readers will know they are reading a postprint, not a preprint --apart from what they infer about the paper from JYY's reputation for quality.  If the identification is more expansive, then readers might be able to tell whether they are reading a postprint that has gone through peer review but not yet copy editing or one that has gone through both. 

In my view, the version control problem is troublesome but not urgent.  I'd like to find a solution, but I'm only interested in solutions that don't deter or delay self-archiving.  Branding alone won't solve all aspects of the version control problem, but as long as authors have already decided to preserve branding, from self-interest, then it meets my criterion of that version-control solutions should not hinder OA.

Since self-archiving is done by authors (or by helpers acting on behalf of authors), the bottom-line recommendation has to go to authors.  Identifying the journals in which your articles are published helps you, not just your readers and publishers.  It's not just another burden on you or another gift to publishers.  It is true common ground.  It's already beneficial and easy.  Let's make it the norm.

*How* should archived postprints identify the journals in which they were published?  I don't want to drift toward false precision or a premature standard, and in fact I'd be happy with just about any kind of identification that effectively conveyed this information to readers who are looking for it.  But even this informal standard requires that the identification should be visible text, not invisible metadata.  All the benefits I spelled out above are enhanced if the journal identification is human-readable.  (Of course, they are also enhanced if the identification is machine-readable.  So I'm not at all arguing against embedded metadata.) 

Authors who take this recommendation will have to add one more small extra step to the steps already required for deposit in an OA repository.  In general, I believe it's harmful to make the self-archiving process more time-consuming or complicated.  But this step is justified because it helps authors realize the goals that led them to self-archive in the first place.  It attracts readers and therefore increases impact.

Les Carr and Stevan Harnad have found that for the average author self-archiving one article takes 6-10 minutes.  If authors took the additional step I'm recommending, then self-archiving might take 6.1-10.1 minutes.  If that starts to deter authors, then I'd regret it as much as anyone and consider withdrawing my recommendation.  One alternative is for archiving software to eliminate even this pebble in road by taking the journal identification from the metadata and copying it to the top of the text file, perhaps after the user clicks "yes, do that".

As long as authors are adding citations, should they add anything else?  Here we risk enlarging the task until authors decide they can't be bothered.  But if we're thinking about what could be useful, there are at least two possibilities.

(1) A version number.  This would take us even further toward solving the version-control problem.  If there were a convention for identifying postprints that have been peer-reviewed but not copy-edited, postprints that have been both peer-reviewed and copy-edited, and postprints that have been revised or corrected since publication, then version numbers would be even more useful.  There is much potential utility here to offset the negligible cost in effort.  But for now this is potential ahead of the demand, or at least ahead of the convention.

(2) A link to the publisher's edition.  On the plus side, this would encourage even more publishers to support postprint archiving and do more to solve the version-control problem.  On the minus side, it does nothing to boost author credibility or readership, it may suggest that the OA edition is inferior to the publisher edition, and it adds another step to the self-archiving process.  It's not clear how these considerations net out.  But I can observe that linking to the toll-access publisher copy doesn't help authors, and that complicating the self-archiving process without compensation for authors is not a good recipe for promoting OA.  Publishers who want to make this step as easy for self-archiving authors as possible might want to make sure that authors have the URL or, even better, the published file with the URL already embedded.  But since authors want to self-archive their postprints immediately upon publication, journals would have to provide these without an embargo.

* Postscript.  I've been talking about postprints because they already have publishers.  But the same argument applies to preprints that have already been accepted for publication.  Think about the difference between your response to an unlabeled preprint and one that says it's forthcoming from the Journal of Yada Yada.


