Trojan horse eprints
SPARC Open Access Newsletter, issue #85
May 2, 2005
by Peter Suber
Some publishers worry that self-archiving will create copies whose download counts they can no longer monitor.  We could increase publisher support for self-archiving, or reduce publisher opposition, if we could solve this problem. 

Unfortunately, the problem may be intrinsic to OA. 
http://dash.harvard.edu/bitstream/handle/1/3997174/suber_news69.html#manycopy

Or at least the only solution on the horizon is a bad one.

A Canadian company called Remote Approach is working on executable scripts embedded in PDF files that will report back to their creators whenever the files are opened, even after they have been copied and redistributed.  That would help publishers keep accurate traffic data, whether the copies in circulation were authorized or unauthorized.  You can tell what demand Remote Approach is trying to meet.

Even though this technology would likely increase publisher support for postprint archiving, I am very suspicious of executable scripts in PDF files.  The problem is not just preserving reader privacy.  If that were all, we might be able to insure that the scripts only collected anonymized traffic data.  The deeper problem is that once we allow scripts in text files for benign purposes, it will be very hard to block, let alone detect, scripts for malign purposes.  Malign scripts could subvert fair-use rights and open access.

The Associated Press reported on March 31 that "Remote Approach is also working on a feature that would let a company block a document from being read if there's no Internet connection."

Imagine downloading a copy of a self-archived PDF to your personal hard drive on Monday.  On Tuesday, when you want to read it offline, you discover that it is unreadable.  The publisher had remotely set it to deactivate when taken offline. 

Imagine a new and "improved" script that can deactivate a file even for online reading.  Imagine self-archiving such a PDF file on Wednesday.  On Thursday, the publisher remotely deactivates it so that nobody can read it, even though it is still online.

As soon as publishers can remotely disable PDFs so that users can't read them offline or from certain addresses online, then PDFs will be unsuitable for disseminating science and scholarship, especially in OA repositories. They won't be suitable again until we have trustworthy tools for scrubbing them clean of the remote activation code.

As soon as Remote Approach delivers the remote-deactivation scripts, authors will have to make a choice.  The publisher's PDF is normally the preferred edition for self-archiving.  Should authors archive the publisher's PDF, if they have permission to do so?  Or should they reduce risk and archive a different format that cannot contain mischievous codes? 

For me, the answer is clear.  Even when I'd like to archive the published edition, I would not knowingly archive any article in a package that could render it useless to me and my readers.  If the purpose of self-archiving is to regain control of scholarly communication and provide open access, then it's perverse to archive a version that puts the access decision back in the hands of a publisher who might choose to turn it off.

Unfortunately, it wouldn't be enough for publishers to disavow the Remote Approach technology.  For true peace of mind, we need to know the state of a file, not the state of a publisher's scripting policy.  For this, we'll need tools to scrub files clean.  It's possible that scrubbing utilities could distinguish good scripts from bad, but I doubt it.  To be assured of safety we may have to scrub out all executable scripts.

Today most publishers that allow postprint archiving forbid authors to archive the published PDFs.  But a significant minority, including the New England Journal of Medicine and California Law Review, forbid authors to archive anything else.  When PDFs can contain malign scripts, publishers in the latter category will be in a hard spot.  Users won't know whether the publishers have a hidden agenda for pushing the PDFs. 

Publishers may think that using Trojan Horse PDFs will deter self-archiving, an outcome that many would welcome.  But in practice it will only deter authors from archiving the PDF edition, aggravating the version-control problem, an outcome that most publishers would regret.  Publishers who see this far ahead may start to shun the PDF format, especially if they cannot assure users that clean files are really clean.

Scholarly publishing is a small part of the overall publishing industry, and it's probably a small part of Adobe's PDF business.  But if other users object as much as scholarly users to the prospect of malign scripts in PDFs, then this prospect could kill the format.  Adobe can avert this risk by giving users an effective OFF switch.

Associated Press, Company develops system to track PDF documents, March 31, 2005
http://business.bostonherald.com/technologyNews/view.bg?articleid=75921
http://www.earlham.edu/~peters/fos/2005_04_03_fosblogarchive.html#a111279578635241864

Joe Brockmeier, Unexpected features in Acrobat 7  LWN.net, March 30, 2005.
http://lwn.net/Articles/129729/
http://www.earlham.edu/~peters/fos/2005_04_17_fosblogarchive.html#a111377805428476765

Robyn Weisman, Remote Approach Launches PDF Tracking Service, PDF Zone, March 15, 2005.
http://www.pdfzone.com/article2/0,1759,1776382,00.asp
http://www.earlham.edu/~peters/fos/2005_04_03_fosblogarchive.html#a111279578635241864

Remote Approach (the company)
http://www.remoteapproach.com/


----------

Read this issue online
http://dash.harvard.edu/bitstream/handle/1/3997158/suber_news85.html

SOAN is published and sponsored by the Scholarly Publishing and Academic Resources Coalition (SPARC).
http://www.arl.org/sparc/

Additional support is provided by Data Conversion Laboratory (DCL), experts in converting research documents to XML.
http://www.dclab.com/public_access.asp


==========

This is the SPARC Open Access Newsletter (ISSN 1546-7821), written by Peter Suber and published by SPARC.  The views I express in this newsletter are my own and do not necessarily reflect those of SPARC or other sponsors.

To unsubscribe, send any message (from the subscribed address) to <SPARC-OANews-off@arl.org>.

Please feel free to forward any issue of the newsletter to interested colleagues.  If you are reading a forwarded copy, see the instructions for subscribing at either of the next two sites below.

SPARC home page for the Open Access Newsletter and Open Access Forum
http://www.arl.org/sparc/publications/soan

Peter Suber's page of related information, including the newsletter editorial position
http://www.earlham.edu/~peters/fos/index.htm

Newsletter, archived back issues
http://www.earlham.edu/~peters/fos/newsletter/archive.htm

Forum, archived postings
https://mx2.arl.org/Lists/SOA-Forum/List.html

Conferences Related to the Open Access Movement
http://www.earlham.edu/~peters/fos/conf.htm

Timeline of the Open Access Movement
http://www.earlham.edu/~peters/fos/timeline.htm

Open Access Overview
http://www.earlham.edu/~peters/fos/overview.htm

Open Access News blog
http://www.earlham.edu/~peters/fos/fosblog.html

Peter Suber
http://www.earlham.edu/~peters
peter.suber@earlham.edu

SOAN is licensed under a Creative Commons Attribution 3.0 United States License.
http://creativecommons.org/licenses/by/3.0/us/


Return to the Newsletter archive