If we want to make a digital file OA, and we already have an OA repository, then we face just two hurdles.  We need a copy of the file and we need permission.  We can call these the custody and copyright conditions.  "Custody" here doesn't mean ownership of the rights, just possession of a copy.  If we have possession and permission, then we don't need ownership.

The OA movement has given far more attention to the copyright or permission problem than to the custody or possession problem.  This may have the effect of sweeping a difficult problem under the rug.  We often have permission when we lack custody, and often find that solving the permission problem is easier than solving the custody problem.  Here are some examples of what could be called permission success and custody failure.

(1) You've published an article in a TA journal which allows green OA or self-archiving.  But the journal only allows deposit of the final version of the author's peer-reviewed manuscript, not the published version.  You're fine with that and eager to make the manuscript OA.  But you can't put your hands on the version you're allowed to deposit.  You think it's on your hard drive somewhere, or in your email archive.  But you're not sure.  You haven't had time to look, or you've looked and found six versions.  You don't have time to figure out which one, if any, is the deposit-eligible, peer-reviewed manuscript, or you've taken the time and you're still unsure.  Or you have the version you submitted to the journal, and all the correspondence with the editor, but you don't have time to reconstruct the version approved by peer review.  Or you might have deleted the relevant version in a fit of spring cleaning, as a superseded version not worth saving, or you might have failed to copy it over from your last computer when you upgraded.  With enough detective work you could find out, but you don't know how much time it would take and you're pretty sure it would take more than you have.

Once this problem arises, there are no solutions except to do the necessary detective work.  The best solution is prevention.  Authors aiming at green OA should understand that permission for deposit is often limited to the final version of their peer-reviewed manuscript.  Institutions aiming at green OA should help authors understand this.  The message is:  you can simplify your life and foster OA at the same time by depositing your peer-reviewed manuscripts as soon as they're accepted for publication.  That gets the manuscripts out of your hair and into the repository before they're lost, deleted, forgotten, modified, or superseded.  You may need version control, bookkeeping, and detective work for other jobs, but not for this job.

(2) You've been funded by the Wellcome Trust, NIH, or one of the other funders requiring grantees to retain the right to authorize OA.  Or you're employed by Harvard, MIT, or one of the universities where faculty grant the institution standing permission to make their future articles OA.  In the former case, you're happy to fulfill your funding agreement.  In the latter case, you have a right to opt out but you're happy with the OA default.  So far, so good.  That solves the permission problem.  But these OA policies also require deposit at the time of acceptance, which should solve the possession or custody problem.  Now it's time to deliver.  If you can't put your hands on the version you're allowed to deposit, then this reduces to case #1.  But perhaps you have a good idea where the file can be found.  You plan to deposit it "real soon now".  However, you're preoccupied with a new article, and don't want to slow down new work to tie up loose ends on old work.  In case #1, you want to find the relevant version but can't.  In case #2, your funder or employer wants you to find the relevant version, and you're hoping it will leave you alone, if only for a couple of weeks while you clear your desk of urgent business.  Or a couple of months.  Come to think of it, you can't remember a time when your desk was cleared of urgent business. 

Many universities solve this problem by paying librarians or student workers to make deposits on behalf of faculty.  When libraries can afford this, I recommend it.  I've long argued that successful OA mandates are implemented through expectations, education, incentives, and assistance, not coercion.  Proxy deposit is the critical kind of assistance. 

Perhaps proxy deposit shouldn't be necessary.  But if it solves the custody problem, when lack of assistance leaves it unsolved, then the benefits are significant and the only question is about cost.  My experience is that trained student workers can do this job well, and we can save librarians to do what non-librarians can't.

Note that assistance helps more with #2 problems than #1 problems.  If you can't find the relevant file, then you won't be able to pass it your student assistant.  Hence, even with assistance we'll still need to educate authors about depositing peer-reviewed manuscripts, or passing them to assistants, as soon as they're accepted for publication. 

To solve #1 and #2 problems, many funders with green OA mandates work with publishers to deposit articles on behalf of authors.  When the NIH started doing this in 2006, I had mixed feelings.  It ensured deposit, but it let publishers choose the length of the embargo period when that was supposed to be the author's decision.

But now my feelings are less mixed.  I support the idea.  The custody problem is serious.  It's probably the major obstacle to faster growth in the compliance rates for OA mandates.  It's a problem worth solving, and publisher deposit is one solution.  (In any case, if authors made their own deposits and decided the length of the embargo, publishers could use other means at their disposal to pressure them to choose the maximum permissible period.)

The next frontier is for publishers to deposit into institutional repositories, not just funder repositories like PubMed Central.  BioMed Central is one publisher aready doing this.  The service is free of charge for many institutions (members of BMC and those using the enhanced version of BMC's Open Repository service), but not for all.

Back in 2008, Nature said it might be willing to deposit into institutional repositories as well.

Last month I asked NPG whether it had started IR deposits, and received this reply from Grace Baynes, head of Corporate Public Relations for NPG (quoted with permission):  "Depositing in institutional repositories in an efficient way proved more challenging than NPG anticipated at first.  Through our experiences of the PEER project we learnt that in many cases we would need an intermediary to pass files and metadata to repositories.  We are now in active discussions about a pilot project to deposit into institutional repositories, and hope to be able to announce more detail soon." 

I look forward to the results of NPG's pilot project.  It could boost the volume of green OA in institutional repositories as well as the OA-archiving culture at those institutions.  And of course it could lead other publishers to follow suit.

Approaching the custody problem in new ways, developers have created a range of tools to automate or semi-automate repository deposits.  For example:

JISC's RePosit Project uses the Symplectic publications management system as a repository deposit interface.

BibApp will discover new publications eligible for deposit in an institutional repository and deposit them directly into the repository.

The same folks who brought you SWORD (Simple Web-service Offering Repository Deposit) offer an accompanying EasyDeposit tool, and support third-party tools using SWORD to facilitate deposit.

The University of Rochester's IR-Plus repository software allows deposit directly from the tools authors may use for writing.

In Germany, the University of Bielefeld and University of Kassel are both developing tools to facilitate deposits directly from the author's personal publication list.

I'm sure I'm overlooking some existing tools, just as I'm sure we'll see many more tools in the future.

Finally, when publishers are willing, they can help institutions solve the custody problem by allowing deposit of the published edition, not just the final version of the author's peer-reviewed manuscript.  This counts as a solution because authors or repository managers can usually grab a copy of the published edition more easily than any particular earlier version.  But while it would solve the custody problem for green OA, and benefit authors and readers for other reasons as well, many publishers think it would harm them.  So don't expect to grow soon.  I say "grow" rather than "arise" or "appear" because many publishers already allow deposit of the published editions of their articles. 

See the SHERPA list of 181 publishers (not journals) allowing green OA for the published editions of their articles with no embargo.  The SHERPA list also includes one publisher allowing deposit after three months, 20 after six months, 14 after 12 months, 2 after 18 months, and 11 with longer embargo periods.

Some publishers not only allow deposit of the published editions, but will make the deposits themselves.  See the NIH list of more than than 1,100 journals (not publishers) depositing the published editions of all their articles based on NIH-funded research....
...and the NIH list of hybrid publishers depositing the published editions of selected articles based on NIH-funded research.

Some of the journals and publishers on these lists are OA and some are not.  Permission from the OA publishers to allow green OA for the published edition is welcome but not surprising.  But permission from the non-OA publishers is welcome and notable.  If TA publishers thought this voluntary action triggered cancellations, they'd stop.  How many TA publishers not on these lists realize that a natural experiment is showing that the practice is harmless?  How many publishers lobbying Congress to weaken or repeal the NIH policy mention the TA publishers who voluntarily do more than NIH-funded authors ask them to do?

Cases #1 and #2 show how the custody problem can slow down green OA, even mandated green OA.  Here are two families of examples showing how it can slow down OA for digitization projects.

(3) In your library's special collections room, you find an unpublished letter from Ben Franklin to Thomas Jefferson calling for open access to publicly-funded research.  (Hunch:  Ben Franklin would only need an hour to understand digital text, half an hour to understand the internet, and fifteen minutes to call for OA.) 

The letter is in the public domain (PD) but hasn't been digitized.  You want to make it OA.  This one is fairly easy.  Its PD status solves the permission problem.  You have custody of the analog original but not of a digital copy.  All you have to do is digitize the letter and deposit the digital copy in your institutional repository. 

However, many variations on this theme are more difficult.  Suppose the original letter is not in your library, but in a library across the globe.  You could probably digitize it with everyone's consent and cooperation if you showed up in person with the right equipment.  But you're time-strapped and cash-strapped.  So is the other library.

Or the original is in a private collection behind lock and key.  Whether the owner is jealous or generous is a crap shoot, and you may be unlucky.  The fact that the work is PD only solves the permission problem for reproduction, not the permission problem for standing in front of the original with a camera.  You still need permission to pass through the locked door, and the obstacles to that may be harder to surmount than mere resistance to reproduction.

Or the original is in a museum which makes money selling print copies.  The museum acknowledges that the letter is PD, but is not interested in helping anyone make an OA copy that would undermine its sales.  This lock-and-key problem may be created by a publicly-funded institution, not a private collector.  But it's still a lock-and-key problem.

Or the original is held by a government agency which is required by law to sell copies, for cost-recovery, rather than give copies away.  (Mandated cost-recovery policies at public agencies are in decline as governments commit to transparency, open data, and OA.  But they're still far from rare.)  Or the the document is not a letter from Ben Franklin but a government report.  It may sit behind a low custody barrier, like an FOI request, or a high custody barrier, like a top-secret classification.

Or the original is not a single letter, but 5,000 fragile, disintegrating manuscripts.  You may have custody of the analog originals, permission (because they're PD), and even the digitization equipment.  But you don't have custody of digital copies of the analog originals.  With time and care you could get custody of digital copies.  But the mountain of demanding, time-consuming effort is a serious custody barrier.  If it weren't, all PD literature, art, and music would already be OA.

Or the work is in a private collection and the owner has already digitized it.  But you don't have a copy of the digital file and the owner won't give you one.  Or the owner has publicly released a thumbnail or low-res copy but not a high-res copy. 

Or the originals are already gratis OA, and you want to download them to your hard drive for text mining.  When PubMed Central has permission from the copyright holders, it makes articles libre OA and allows bulk downloading.  It calls this the Open Access Subset of PMC.

But only 10% of PMC belongs in the libre OA subset.  The other 90% is gratis OA, not libre, and PMC is obliged by the rights-holders to block bulk downloading.  (Thanks to PMC's Ed Sequeira for these details.)

Because BioMed Central offers libre OA to its whole corpus, it can offer its whole corpus for bulk downloading.

In this sense, libre OA removes custody barriers that gratis OA may leave in place.  The difference between gratis and libre OA isn't limited to permission barriers; permission barriers can create downstream possession and custody barriers.

Note that when want to go beyond gratis OA to libre, and beyond online digital copies to bulk downloading for purposes like text mining, we've shifted from cases in which we want custody in order to provide access to cases in which we want access in order to have custody.

(4) Here's a variation on the previous theme that deserves a section to itself.  In May 2009 the Cornell University Library lifted restrictions on Cornell-digitized PD books.  Previously it made the digital copies gratis OA and required permission for redistributing copies.  When it lifted these restrictions, it issued an exemplary public statement explaining its new policy:

In a dramatic change of practice, Cornell University Library...will no longer require its users to seek permission to publish public domain items duplicated from its collections....The Library, as the producer of digital reproductions made from its collections, has in the past licensed the use of those reproductions. Individuals and corporations that failed to secure permission to repurpose these reproductions violated their agreement with the Library. "The threat of legal action, however," noted Anne R. Kenney, Carl A. Kroch University Librarian, "does little to stop bad actors while at the same time limits the good uses that can be made of digital surrogates. We decided it was more important to encourage the use of the public domain materials in our holdings than to impose roadblocks." ...Institutional restrictions on the use of public domain work, sometimes labeled "copyfraud," have been the subject of much scholarly criticism. The Cornell initiative goes further than many other recent attempts to open access to public domain material by removing restrictions on both commercial and non-commercial use....

Part of the background here is that ordinary digitization lacks the creativity needed for copyright.  Hence, digitizing an analog PD work creates a digital PD work, not a digital copyrighted work.  (In Europe but not the US there is a "database right" beyond copyright for protecting digitized collections that required substantial investment, even if the digital files are all PD.)  Another part of the background is that custodians of PD works may, if they wish, take advantage of lock-and-key barriers.  They needn't release their holdings at all, or they may release them on certain conditions when the conditions are based on contract, not copyright.  Cornell knew that its digital copies of PD originals were themselves PD.  It couldn't ground its restrictions in copyright without copyfraud (false claim of copyright) and it no longer wanted to take advantage of lock-and-key or contract barriers.  On the contrary, it decided not to "impose roadblocks" or limit "good uses". 

Since Cornell took this step, the underlying principle has been widely endorsed:

* In February 2010, the EU Culture and Education Committee unanimously supported the principle that when PD works are digitized, the digital copies should be considered PD as well.

* In May 2010, the Europeana Public Domain Charter asserted that "[d]igitisation of Public Domain content does not create new rights over it...."

* In December 2010, version 1.0 of the Open Access Principles for Australian Collecting Institutions asserted that "as a matter of open access best practice copyright should not normally be asserted over verbatim copies of public domain resources which do not have independent originality."

* In January 2011 the EU's Reflection Group or ComitÈ des Sages concluded that "mere digitisation process should not generate any new rights."

Just last month, Yale became the latest and largest university to take the Cornell step.  "The goal of the new policy is to make high quality digital images of Yale's vast cultural heritage collections in the public domain openly and freely available.  As works in these collections become digitized, the museums and libraries will make those images that are in the public domain freely accessible.  In a departure from established convention, no license will be required for the transmission of the images and no limitations will be imposed on their use...."

To solve the custody problem for PD works, we need the good will of the custodians, like Cornell and Yale.  Only the custodians can eliminate lock-and-key restrictions through digitization and online hosting.  (They can digitize and host on their own or allow others to do so.)  And while no one can assert copyright restrictions on PD works, only the custodians are in a position to lift or prevent contract restrictions.

Note that when the custodians of PD works use clickwrap licenses or other contracts to limit the use of PD digital copies, the restrictions only apply to users who enter the contract.  A clickthrough user who leaks a PD digital copy to the wide-open internet may be liable for contract infringement.  But all the other internet users around the world are as free to use the PD digital copy as they are to use any other PD work.

* Postscript.  For further discussion of #3 and #4 problems, see "OA for digitization projects", originally published in the July 2009 SOAN, and slightly revised in Karl Grandin (ed.), _Going Digital: Evolutionary and Revolutionary Aspects of Digitization_, Nobel Foundation, Royal Swedish Academy of Sciences, April 2011.


