Self-archiving diary
SPARC Open Access Newsletter, issue #150
October 2, 2010
by Peter Suber
I have a confession to make.  For as long as I've urged scholars to support OA, I've urged them to self-archive.  But I wasn't systematic about doing it myself until last year. 

I'm not as guilty as I look.  I did make all my work OA, with the exception of one 1998 book for which I don't yet have permission.  In fact, I started making my work OA in the mid-1990's, about five years before I started serious OA activism.  But I made these works OA on my personal web site.  For most of the past 15 years, I knew that making them OA through a repository would be better, and I urged other scholars to make their work OA through repositories. 

The work on my personal web site was bona fide green OA.  It was even libre green OA, since I used a custom-written open license.  (Creative Commons didn't launch until 2002.) 

But personal web sites at universities tend to disappear when people change jobs or die.  Personal web sites from GoDaddy and the like tend to disappear the day you forget to renew them.  Then the domain names are bought up by pornographers who use your accumulated Google rank to boost their traffic.  Most of the surprised visitors will blame you for the surprise, and they'd be half-right to do so. 

I knew the vulnerability of personal web sites from firsthand experience.  After I spent about two years coding HTML copies of my print publications for OA through my Earlham home page, a student worker with the best of intentions reorganized the whole college web site in 1997 without consulting users.  He changed the URL of every page and broke all incoming links. 

Repositories don't do that (or shouldn't).  They use persistent URLs (or should).  They take steps toward long-term preservation (or should).  If you put your work in a repository, it should not only be OA, and indexed by search engines, but should stay put even if you move to a different institution, stop paying your bills, or drift from the first to the second half of your institution's publish or perish policy.

I knew that, and I made the case for repositories again and again. 

(I probably made the case most extensively in this article from May 2004.)
http://dash.harvard.edu/bitstream/handle/1/3997172/suber_news73.html#oai-google

* My cobbler problem

Why didn't the cobbler's children have shoes?  Was it overwork?  Too many children?  Priority for paying customers?  Hypocrisy?  Secret disdain for shoes?  Someone should investigate.  

In my case, I would certainly have used a repository if I could have.  Or I hope I would have.

When I started making my work OA on my web site in the mid-1990's, OA repositories were scarce, non-interoperable, and hard to search.  (The Open Archives Initiative didn't launch until 1999.)  Apart from my personal plans for my own work on my own site, OA had barely crossed my radar and I didn't know about the pioneering early repositories.  But when I did learn about them, I joined Stevan Harnad and others in making the case for self-archiving.  (Stevan first proposed OA self-archiving in 1994.) 

The snag was that Earlham didn't have an institutional repository, and there weren't suitable subject repositories in either of my two subject areas, philosophy and OA.

In March 2004, I persuaded E-LIS and dLIST, two OA repositories for library and information science, to accept articles about OA. 
http://www.earlham.edu/~peters/fos/2004/03/providing-oa-to-articles-about-oa.html

That's when I lost my first excuse.  I did deposit a couple of pieces in E-LIS, but I didn't systematically deposit my OA pieces in E-LIS.  I kept thinking:  "I'd prefer to put all my pieces in one repository.  So I'll wait for Earlham to launch one.  Meantime, the pieces are still OA through my personal web site."  It wasn't a bad excuse, but in retrospect I find it inadequate.  I'm glad I didn't die before working out my present solution.  (Full disclosure:  I have other reasons to be glad I didn't die.)

There were two small subject repositories for philosophy fairly early in the game, PhilSci and Sammelpunkt.  But both specialized in kinds of philosophy other than mine.  The first general OA repository for philosophy was PhilPapers.  But PhilPapers launched its public beta in January 2009, almost 15 years after I started making my work OA through my personal web site, and the same year that I found my current solution and started systematic self-archiving at Harvard's institutional repository, DASH (Digital Access to Scholarship at Harvard).

What other options did I have before 2009? 

In 2004, Brewster Kahle approved my plan to launch a universal repository at the Internet Archive.  Motivated in part by my own quandary, the idea was to build a repository to accept deposits from scholars who didn't have repositories in their institutions or fields.  An ambitious second layer was to mirror and preserve all the willing OA repositories in the world.  I still like the idea, and I'm sorry it never got off the ground.  (Long story.)

Within a few years several repositories began offering "universal repository" options to scholars with nowhere else to turn, not even counting those that were universal for scholars in a given field.  For example, Tampub was primarily for Finnnish scholars, and Ad Astra for Romanian scholars, but both offered refuge to homeless foreigners.  The British Library Research Archive was universal for UK scholars, and Israel Scholar Works was universal for Jewish scholars.  Bepress' ResearchNow was universal for preprints.  And there was a slew of repositories like DocStoc, Egnyte, Google Knol, Scribd, Twidox, Wikimedia Commons, and Wikisource, so universal that they didn't limit themselves to academic or research literature. 

The universal repository options available today are more attractive than any of these.  But like PhilPapers, they were too late for me.  They're functional and even elegant.  But by the time they launched and I had confidence that they'd work for me, I'd already solved my problem through DASH.

Depot launched for UK scholars in April 2007.  Despite the initial limitation to the UK, it had exactly the right idea.  It would take deposits from any UK scholar and either redirect them to the scholar's own institutional repository or host them itself.  When it faced the loss of its UK funding in 2009, I was among those recommending that it become international, or truly universal, and seek funding from outside the UK.  It succeeded in late 2009, and just re-opened for business this fall under the new name of OpenDepot. 
http://opendepot.org/

OpenAire is a Depot-like service for European scholars, but it didn't launch until February 2010.
http://www.openaire.eu/

Academia.edu and Mendeley both launched in 2008, and were operational and universal before OpenDepot, before OpenAire, and before I had the DASH option.  I liked them both, and explored them as potential solutions to my problem. 
http://www.academia.edu/
http://www.mendeley.com/

Although I had a slight preference for institutional repositories, my chief reservation was that I felt obliged to wait for some evidence that they would survive.  I've felt this caution about every new app or service I've wanted to use since the mid 1980's, when I had to migrate all my academic writings from WordStar.  Not even the small world of universal OA repositories for academic research was immune to start-up failures.  Scholas was a promising universal repository, but it died in 2009, the same year it launched. 
http://schol.as/

Perhaps a failed repository would let me harvest my deposited papers for redeposit elsewhere.  But perhaps it wouldn't.  It would depend on how it failed.  The significant labor of depositing my backlog of existing publications made me extra cautious.

Today, however, there are four universal academic repositories worth your attention:  OpenDepot, OpenAire, Academia, and Mendeley.  If you're in the situation I was formerly in, without a repository in your field or institution, check them out.  I'd look at OpenDepot and OpenAire first, since they rest on the worldwide network of interoperable institutional repositories.  By redirecting deposits to IRs, when IRs exist, they help support institutional cultures of OA archiving.  By hosting deposits themselves, when IRs don't exist, they provide a valuable interim solution while IRs continue to spread.

In that sense, my preference for OpenDepot and OpenAire derives from my preference for institutional archiving.  Here's how I made that case in February 2009:
http://dash.harvard.edu/bitstream/handle/1/3716772/suber_news130.html#choicepoints

There are great advantages in having authors deposit in their own institutional repository.  It helps institutions share, analyze, and evaluate their own research output.  It adds local incentives to funder mandates to prod and reward author participation.  It adds robustness to preservation, on the LOCKSS principle, by distributing the literature around a large network.  It ensures that the system will scale with the growth of published research, simply from the fact that distributed networks are more capacious than any individual node.  Above all, it nurtures local cultures of self-archiving at every university, which will benefit non-funded research and research funded by non-mandating funders. 

But in the same article I argued that the stakes are low in the choice between institutional and disciplinary repositories, and that your repository choice needn't be exclusive.  Now that my papers are flowing into DASH (the process is ongoing), I'm happy for them to be on deposit elsewhere as well, especially if it takes no extra work.  PhilPapers, for example, is willing to populate its repository by harvesting institutional repositories for philosophy papers, using its own home-grown subject-matter criteria to decide what counts as a philosophy paper.  Just last month, I sicced it on DASH and it should soon host its own set of my philosophy papers. 

(It should.  As of yesterday, however, it had harvested 100 philosophy papers from DASH, none of them mine.  Either it's still harvesting or my papers slip through its criteria.) 

I'm very willing to do the same with Academia and Mendeley, for example, when they can harvest from DASH or from my DASH RSS feed.  Academia and Mendeley both offer automated methods of deposit, but they don't yet work for me, unfortunately, in part because they're limited to PDF.  Note to repository managers:  Supporting PDF alongside other formats like HTML and XML is a feature; supporting PDF-only is a bug.

* Finally a solution

I became a Berkman Fellow at Harvard in July 2009.  That fall I learned I was eligible to deposit in DASH, and the deposits began early in 2010.

Reaching this stage was huge relief.  I moved from a vulnerable personal web site to a durable institutional repository.  And I moved from making my own work OA, in isolation, to doing my part to populate an OA repository and nurture a culture of OA archiving at the hosting institution.  These were big steps for me, in part because I'd been advocating them for so many years.  My work was safer, but I was also in harmony with my long-standing public advice to fellow scholars.

While working toward tenure at Earlham in the 1980s, I mentally compared the process to whitewater kayaking.  Tenure itself felt like an eddy turn into a still pool.  You're still in your boat and still in the rapids.  But you're parked behind a rock protecting you from the current.  You can pause, take a breath, look around, register your progress, and think about how to navigate the visible part of the river below.  The best way for me to describe the relief of systematic self-archiving in an institutional repository is that it felt like another eddy turn downstream.  It felt like Tenure 2.0.  The turbulence from which I'd escaped was the insecurity of my personal web site and the dissonance of falling short of my honest recommendations. 

For most researchers, the relief should be even greater because self-archiving marks the even greater transition from TA to OA.  It makes work available, for the first time, to everyone who could make use of it, apply it, cite it, or build on it.  It bypasses the access barriers which keep you from finding readers and keep readers from finding you.  I missed out on this most significant layer of relief, or realized it incrementally over the past 15 years, because everything I'm putting into DASH was already OA elsewhere. 

* What was easy

Depositing my philosophy articles and preprints was probably much like depositing anyone else's academic articles and preprints.  (I note a few exceptions below.)  But after 2001 or so, I was publishing much more on OA than philosophy, and for a variety of reasons depositing my work on OA was less typical.  So let me make a few more confessions, in the spirit of acknowledging the ways in which I was an easy case.

1.  My newsletter articles, and most of my other OA pieces, were published under CC-BY licenses.  I already had explicit, indisputable permission to deposit them.  I didn't have to wonder about it, didn't have to risk proceeding without it, didn't have to waste time requesting it, didn't have to pay a dime to secure it, and didn't have to make dark deposits while I worked on the problem of how to open them up.  Permissions can be a big headache for some authors and some publications. 

There are good ways to solve these problems for future publications (ask me about them!), but no particularly good ways to solve them for retroactive deposits.

2.  For the same reason, I had permission to deposit the published editions.  I didn't have to deposit inferior versions and I didn't have to hunt down earlier versions which I might not have kept or kept unaltered.  Laying hands on the version one is permitted to deposit can be a big problem for some authors and some publications.

3.  I was strongly motivated and already understood the issues.  "Strongly motivated" is an understatement.  I was avid; I was guilty with impatience.  I didn't need a mandate from a funder or employer.  I didn't even need encouragement. 

In practice, as I've often argued, OA mandates are implemented through expectations, education, incentives, and assistance, not coercion.  But I didn't even need expectations, education, or incentives.  I did receive assistance, however, which in gratitude I will treat separately:

4.  Harvard pays a cadre of student workers, Open Access Fellows or OAFs, to deposit work in DASH.  The Office for Scholarly Communication put a hard-working and meticulous OAF on my case.  (Thank you OSC, and thank you JC Guest.)

I hope I would have begun systematic self-archiving as soon as I learned that I was eligible to deposit in DASH, even without assistance.  But I admit that I'll never know for sure, since I never faced the question in that form.  I had OAF assistance as soon as I had deposit rights.  I'll also admit that the assistance made a huge difference in my plan for retroactive deposits, which was many times more daunting than my plan for prospective deposits. 

* Bumps and decisions

I knew that my experience would be easier than most in at least those four ways.  So I took careful note of the bumps in the road, large or small, and the decisions we had to make along the way that didn't rise to the level of bumps.  Here's a round number of them in the spirit of acknowledging that even easy cases can face unexpected complexities. 

1.  I was the first DASH author to want to deposit HTML rather than PDF.  This was no problem for DASH, technically or administratively.  But we did have to ask to be sure.  This decision even made life easier for my OAF, since she didn't have to create PDFs.  The HTML also allowed her to edit links to my other publications, changing them from relative URLs pointing to Earlham copies into absolute URLs pointing to DASH copies. 

2.  Converting those URLs turned out to be a bump.  When she deposited a piece with no such links to convert, the whole process took 15-20 minutes per paper.  When she had to convert links, it took about 40 minutes.  For the first 200 deposits or so, she converted all the links needing conversion.  For subsequent deposits, she didn't, but she took notes on which pieces had unconverted links so that we could go back and modify them later.

3.  When I created the OA edition of my 1990 book, The Paradox of Self-Amendment, I put separate chapters into separate files.  We could easily deposit each file into DASH, but we couldn't easily show that they were connected to one another.  This is precisely the problem to be solved by OAI-ORE (Open Archives Initiative - Object Reuse and Exchange), the latest iteration of the OAI protocol.  DSpace doesn't yet support OAI-ORE by default, but Harvard is studying ways to add OAI-ORE support.  Meantime, we're improvising another solution.  In addition to depositing the separate chapter files, we're depositing a table of contents file with links to each of the chapters.  When I link to "the book", I'll link to the table of contents file.  (This is still in the works, so you won't find it on DASH just yet.) 

I'm also working on a single-file PDF version with a reader who likes PDF better than I do.  (Just about everybody likes PDF better than I do.)  I'll deposit it when it's ready, but it's not ready yet.

4.  A handful of my philosophy papers were written for publication, but for different reasons I never submitted them and no longer plan to submit them.  The word "forthcoming" in the metadata would be misleading.  What term should we use instead?  This one is still under discussion.  It should come up with other DASH depositors but came up with me first.

5.  I wanted to deposit my newsletter articles as stand-alone articles, but I also wanted to deposit my newsletter issues.  Should we deposit just the issues, with internal links to the articles?  (This is what I do at my personal web site.)  Or should we deposit the two separately?  We decided to do the latter.  The main reason is that DSpace wants a separate file for each record.  I don't object to this solution, but I do note that it was driven by the software, not by considering what would be easiest or clearest for users.

6.  DASH links to the published editions of deposited articles, a practice I support.  But what is the published edition of my newsletter articles?  The version at the SPARC distribution list (without internal links to internal sections)?  The version at my Earlham home page (with internal links to the internal sections)?  Or should we simply link to SPARC itself, as the publisher?  After some discussion we settled on the second of these, although in time we may add links to the SPARC copies as well.

7.  Some of my newsletter articles were subsequently published elsewhere as well.  DASH has no problem listing the other versions.  But what should we do when the second version is a translation, or contains notable revisions, and I don't have permission to deposit it?  In these cases we decided to deposit the newsletter edition alone and link to the other editions. 

8.  When I find a typo in an older newsletter article, I fix it in the edition posted to my personal web site.  I know that violates archival purity.  But I tell users that I do it, and I limit the fixes to typos.  Can I do the same with DASH deposits?  At the moment the answer is no.  I have permission to deposit but not to revise existing deposits.  I can ask someone higher in the chain of command to make a revision, but I won't want to do that very often.

9.  Some of my philosophy publications use logic notation, and I posted them online before HTML 4 gave us a good way to represent logic symbols as text characters.  In those days we had to use small images of the symbols.  (I feel like an old codger talking this way.)  I could either convert the files to PDFs, which would integrate the text and images, or I could edit each file to replace the images with text characters from special symbol fonts introduced with HTML 4.  I'm choosing the second path, though I haven't finished doing the work yet.  This is definitely a bump in the road, and one I trace to my decision to favor HTML, or to offer HTML in addition to PDF whenever I can.

10.  A few of my publications have actual images other than logic symbols.  In these cases I caved and decided to convert the files to PDF.  When DASH supports OAI-ORE, I may go back and deposit the HTML text and images separately, and use ORE methods to aggregate the separate files.

11.  In a few cases I never got around to depositing the full text at my Earlham web site.  The text was OA at the publisher's site and I linked to the publisher's copy.  (I don't approve this practice and can't remember why I felt constrained to follow it.  Either I was very overstretched at the time or I didn't have permission to do anything else, both easy for me to believe.)  However, in nearly all of these cases, by the time I got around to self-archiving the pieces, the links were dead!  I had to find working URLs, which I was able to do.  This was a good lesson in the benefits of persistent URLs, and in the benefits of going beyond repository links to repository storage.

12.  A couple of my works are print-only.  I published them before the net and web, never had digital editions, and never rekeyed them.  In these cases we'll deposit image scans.  We'll deposit OCR'd text versions as well, though we don't know yet whether we'll have time to deposit at the same time as the image scans or whether we'll have to add them retroactively later on.

13.  When I pulled the plug on Open Access News in April 2010, my DASH archiving was already in process.  I asked whether I could deposit the now-complete blog archive.  The content was entirely acceptable to the DASH managers, even desirable.  The problem was that in its raw form it consisted of more than 400 separate archive files (18,000+ posts over 8+ years).  Charles Bailey was good enough to consolidate all the files into one zipped file about half a GB in size.  The unzipped files would be unwieldy but searchable, and the zipped file would be the reverse.  Which would be better?  While we were mulling this over, an elegant solution emerged from a different quarter.

The Harvard University Archives, separate from DASH, agreed to harvest and preserve my entire Earlham web site, including the unzipped blog archive, in its forthcoming "H-Sites" series.  It will even re-harvest periodically to capture my continuing updates and rewrite my links to Earlham pages so that they point to archived copies of those pages.  This too is still in process but should be ready soon.

Meantime, the blog archive is still up and searchable at its original location...
http://www.earlham.edu/~peters/fos/fosblog.html

...and Charles Bailey's 475 MB zipped version of the blog archive is available from his site.
http://www.digital-scholarship.com/other/OpenAccessNews.zip

14.  We decided to omit a lot.  In particular we're omitting interviews where I'm the subject rather than the author and don't hold the copyright.  I could seek permission, but for now I don't have time.  We're omitting my course hand-outs, although I was scrupulous about making them all OA when I was teaching.  But I'm no longer teaching.  One day I'll want them in the repository, and DASH will accept them; but they're not a priority for now.

15.  Finally there were the problems I feel lucky to face.  Should I replace the Earlham copies of these works with redirects to the DASH copies?  (Pro:  It would steer people and search engines to the persistent URLs.  Con:  It would sacrifice the Google juice on the Earlham copies.)  Should I keep the Earlham copies up but just add links to the DASH copies?  For now, I'm taking the latter course, or I will when I have time.  But one day, especially if Earlham wants to save server space when I perish, I'd like redirects.

* Here are a few links: 

My home page at Earlham, where I've posted all my work to date (minus one book for which I lack permission)
http://www.earlham.edu/~peters/

My section within DASH (still growing)
http://bit.ly/dash-suber

RSS feed for my section within DASH
http://bit.ly/dash-rss-suber

DASH (Digital Access to Scholarship at Harvard)
http://dash.harvard.edu/


----------

Read this issue online
http://dash.harvard.edu/bitstream/handle/1/4551999/suber_news150.htm

SOAN is published and sponsored by the Scholarly Publishing and Academic Resources Coalition (SPARC).
http://www.arl.org/sparc/

Additional support is provided by Data Conversion Laboratory (DCL), experts in converting research documents to XML.
http://www.dclab.com/public_access.asp


==========

This is the SPARC Open Access Newsletter (ISSN 1546-7821), written by Peter Suber and published by SPARC.  The views I express in this newsletter are my own and do not necessarily reflect those of SPARC or other sponsors.

To unsubscribe, send any message (from the subscribed address) to <SPARC-OANews-off@arl.org>.

Please feel free to forward any issue of the newsletter to interested colleagues.  If you are reading a forwarded copy, see the instructions for subscribing at either of the next two sites below.

SPARC home page for the Open Access Newsletter and Open Access Forum
http://www.arl.org/sparc/publications/soan

Peter Suber's page of related information, including the newsletter editorial position
http://www.earlham.edu/~peters/fos/index.htm

Newsletter, archived back issues
http://www.earlham.edu/~peters/fos/newsletter/archive.htm

Forum, archived postings
https://mx2.arl.org/Lists/SOA-Forum/List.html

Conferences Related to the Open Access Movement
http://www.earlham.edu/~peters/fos/conf.htm

Timeline of the Open Access Movement
http://www.earlham.edu/~peters/fos/timeline.htm

Open Access Overview
http://www.earlham.edu/~peters/fos/overview.htm

Open Access News blog
http://www.earlham.edu/~peters/fos/fosblog.html

Peter Suber
http://www.earlham.edu/~peters
peter.suber@earlham.edu

SOAN is licensed under a Creative Commons Attribution 3.0 United States License.
http://creativecommons.org/licenses/by/3.0/us/


Return to the Newsletter archive