Welcome to the Free Online Scholarship (FOS) Newsletter
     March 25, 2002

Article summarizing software

In the last issue, I asked for examples of text summarizing software applied to scholarly content.  I got these helpful leads in reply.

Cornell University's Big Ear is software that reads a half-dozen law-related mailing lists.  It doesn't summarize the discussions, but it does notice when a posting announces a new document, web site, or product.  Big Ear then extracts the announcement and posts it to its own list of such announcements.  The result is tightly focused list of announcements of interest to wired lawyers.
(Thanks to Steven Perkins.)

There are many free and commercial text-summarizing programs.  Extractor, however, is the only one I've seen that puts a button on your browser's toolbar.  Surf to an online essay and click the button.  You get a list of keywords from the essay and links to deeper analysis.  Click for more and you get a bulleted list of the essay's major propositions and radio buttons to instruct the software on which keywords are more central than others.  The software will also  recommend similar articles (through an Alta Vista search on Extractor's keyword list), translate it to or from five languages (through Alta Vista's BabelFish), or read the text aloud (by running it through Bell Labs' text-to-speech software [FOSN for 10/5/01] and streaming the result to the Windows Media Player).  Extractor is produced by the National Research Council of Canada.
(Thanks to Stephen Downes.)

Pertinence is text summarizing software with a free web-based demo.  Give it some text (by local filename, by URL, or by cutting/pasting into a web form) and it will mark every sentence in a different shade of blue, according to the importance or centrality of the sentence within the essay.  The darker the blue, the more important.  You can ask to see the top 10% of sentences by importance, the top 20%, and so on down to the full 100%.  In my trial run, Extractor did a better job than Pertinence in finding the central propositions of one of my online essays.  (Free registration required.)
(Thanks to Joshua Schachter.)

Péter Jacsó's column in the February _Information Today_ reviews several free and affordable text summarizing programs.  This is especially helpful because the other reviews and collections I found online were out of date.

my.OAI, a new search engine for OAI-compliant archives, promises automatic text summaries in a future release.  See the my.OAI story below, under Developments.


Internet filtering and online scholarship

Internet filters are relevant to FOS because they limit the content that users can find and read, even if the content exists freely online.  There are two big filtering stories this week:  the release of the ICRAfilter and the opening day of the CIPA trial.

* The Internet Content Ratings Association (ICRA) is a new approach to internet filtering or censorware.  Traditional filtering software consults a list of taboo sites.  The ICRA software consults descriptive tags embedded in a site's HTML code.  Site authors insert their own tags.  ICRAfilter users decide which tags should trigger blocking action.

ICRA is proud of the fact that the site tags are voluntary rather than compulsory, chosen by authors rather than imposed by censors, and neutral descriptions rather than moral judgments.  This is a step forward, and if it works this way, it will be an advance on filters that consult blacklists using unknown criteria or known criteria crudely implemented.

Are there any worries?  Seth Finkelstein points out that movie ratings started out as voluntary and self-imposed but have become quasi-governmental edicts in some countries and outright government edicts in others.

I worry that the site author is only one party whose voluntariness matters.  The other party is the surfer.  I don't object strongly when parents clamp filters on the browsers of their young children.  But when students, employees, and patrons of public libraries are forced to use filters imposed by schools, employers, or governments, that is censorship and it cannot be defended by calling it voluntary.

Karen Schneider, editor of the Librarians' Index to the Internet, put it this way:  parents who want filters for their children "never want to stop there, they want to have it in libraries and schools, they want to have it for other people."

We know that many public schools in the U.S. already use web filters produced by the Religious Right (FOSN for 3/18/02).  What would improve if they switched to the ICRA method?  If the same school administrators who bought the blacklists produced by the Religious Right configured the ICRAfilter, nothing of importance would change.  Students would be involuntarily blocked from sites that were voluntarily labelled.

ICRA has gathered $1.15 million from the European Union and endorsements from Microsoft, AOL, IBM, Yahoo, and many other major tech firms.

General coverage of the ICRAfilter launch

Seth Finkelstein's critique
(Thanks to C-FIT.)

ICRA press release

ICRA home page

* The U.S. Children's Internet Protection Act (CIPA) requires libraries receiving federal funds to use web filters on machines accessible to the public (see FOSN for 10/5/01, 11/2/01).  The ACLU has challenged its constitutionality on First Amendment grounds, and the case goes to trial today.

The CIPA is in court because makers of censorware and blacklists want to block all content of a certain kind:  illegal obscenity plus anything else that is "harmful to children".  The only way to do so is to err on the side of overbreadth.  On the other side, grown ups and the First Amendment demand that no lawful or harmless content be blocked, which requires erring on the side of underbreadth.  There won't be a solution that pleases everyone until filters can perfectly block all and only unlawful and harmful content.  But we know that will never happen, because the set of blocked sites is precise while the categories of illegal and harmful content are inherently fuzzy and contested.

Let me elaborate on this point for a second.  We can never prove the congruity of a precise set with a fuzzy and contested one.  That's why we can't prove the congruity of politicians elected and politicians worthy of election, dollars spent and value gained, or degrees earned and wisdom learned.  (Academic digression:  this is also why we can't prove the congruity of the set of Turing-computable functions and the set of effective methods, and hence why we accept the indemonstrability of the Church-Turing Thesis.)

If the advertising promise of the filtering industry is that technical advances or voluntary labels will enable us to attain congruity between the set of blocked sites and the set of illegal obscenity and content harmful to minors, then it is hallucinatory.  Users would be deceived and manufacturers would be deceivers, unless they were both guilty of self-deception.

The ICRA approach doesn't solve this problem at all.  It lets users make the overbreadth/underbreadth decision themselves, which is a step forward.  But this step forward only applies to users who configure the software for themselves, not users for whom configuration decisions are made by others (children, students, employees, library patrons).

Just as mathematicians admit the indemonstrability of the Church-Turing Thesis and move on, let's admit the unattainability of perfect blocking and discuss the issues we then face:  whether to err on the side of overbreadth or underbreadth, and how to make filtering voluntary for every surfer beyond infancy.  The CIPA trial should settle the first question.  We haven't yet started to address the second.

The CIPA trial starts today


Censoring search engines

The Church of Scientology (CoS) believes that many pages hosted at Xena.net violate its copyright.  In addition to pursuing Xena.net directly, the CoS asked Google to stop returning links to those pages in searches.  Google complied.  The DMCA shields web sites from liability for hosting or linking to infringing sites, but only if they remove them promptly when notified.  The legal pressure on Google to comply arose both from the DMCA requirement of promptness, eliminating the opportunity to ascertain whether a real infringement had occurred, and the DeCSS precedent (FOSN for 12/5/01), which prohibits links to infringing content just as much as infringement itself.

Google has a procedure to reinstate links deleted in this way, but it can only be applied after the content is blocked and for some its complexity may require the assistance of a lawyer.  Other search engines face the same pressures and respond in essentially the same way.  The problem lies with the DMCA, not with Google.

Here's one reason why this case is relevant to online scholarship.  The DMCA creates pressures that distort search engine return lists.  It's one thing when their omissions corresponded to illegal content.  But this case shows that many omissions correspond to nothing more than lawyers' threatening letters.  Insofar as search engines make sites visible, the DMCA has given copyright holders a veto on a site's visibility, a veto they can apply without going to court and proving infringement.

Some online critics are saying that the CoS is merely suppressing criticism, not enforcing its copyrights.  It's true Xenu is a critic of scientology, and it's also true that scientologists have a history of trying to silence critics through copyright lawsuits.  But in fact the author behind Xenu, Andreas Heldal-Lund, admits that he has put full texts from the CoS online.  His reason is to reveal the nature of scientology in a way that cannot be accused of quoting out of context.

Xenu's pages are still online; most are simply not listed in the Google index.  Moreover, the CoS has not yet proved any copyright infringement.  Here's another way to see why the case is relevant to online scholarship:  given these facts, Google was guilty of nothing more than accuracy in showing which sites were online and relevant to certain queries.  The DeCSS case showed that accurate linking can violate the DMCA; now we know that an accurate search index can also violate the DMCA, or at least that lawyers can cite the DMCA when threatening a lawsuit to prevent accuracy.  Accuracy has become an offense.

General coverage of the story

How the scientologists got one of Xenu's upstream providers kicked off its ISP

Heldal-Lund will not challenge the Google action because it would subject him to U.S. jurisdiction

Xenu.net home page
(Some friends of Xenu are "Google bombing" by linking to the site in order to cause its Google page rank to rise.)

Google's letter to Xenu.net
(Contains (1) a Google admission that it faced DMCA liability "regardless of [the] merits" of the CoS complaint, (2) an outline of the procedure Heldal-Lund would have to follow to reinstate his links in Google's index, and (3) a list of the offensive URLs, in case you want to read what the CoS doesn't want you to read.)

Dave Touretzky's template for responding to an ISP that deletes your content because it was threatened with DMCA liability
(Thanks to C-FIT.)

* Postscript.  The CoS could have avoided the bad press of objecting to Google's accuracy in pointing to online content relevant to scientology.  All it had to do was go to court, prove infringement, and use the verdict to force the infringing pages from the web.  Then Google's active links would disappear and, if any threat was needed, it would only apply to the Google cache.  CoS may have worried that a copyright lawsuit would increase Xenu traffic, just as suing for defamation publicizes the defamation.  This worry may have been justified, but it ignores two facts.  First, once CoS had a verdict in its favor, curious readers would find it much more difficult (though not impossible) to read the infringing pages.  Second, infringement is a routine, low-profile news story, while censoring Google is big.  It's as if the CoS legal department were more concerned to show the need to amend the DMCA than to protect church secrets.



* François Schiettecatte has created my.OAI, a new search engine for OAI-compliant archives.  It doesn't cover all OAI archives, but will cover any combination of seven major ones.  It includes a flexible web form to limit a search to given metadata fields, dates, or archives.  Registered users can set preferences, store documents in customized folders, and store any search for reuse.  A future version will offer automatic text summaries and links to similar documents.  The search engine is online and working, though not all the functions are fully implemented yet.  Schiettecatte welcomes comments and suggestions from users as he finishes coding the feature set.

* The British Library announces that over 100 publishers have agreed to deposit their electronic publications in its repository to promote their long-term preservation.  The archive now contains over 800 ebooks and 850 ejournals.  Currently  publisher deposits are voluntary, but the UK Department of Culture, Media and Sport is using the experience to refine legislation to make future deposits compulsory.  Rules about who can access the archive and on what terms are still being worked out.

* SPARC has announced its partnership with the _Journal of Vegetation Science_ (JVS). JVS was the first journal I know of to "declare independence" from a commercial publisher on the ground that its subscription price was exorbitant (FOSN for 11/2/01, 11/9/01).  Its editors left Kluwer's _Vegetatio_ in 1989 and launched JVS at an affordable price with Opulus Press.  JVS now exceeds Vegetatio in impact factor, at nearly one-seventh the price.  The new partnership with SPARC will widen its subscription base among libraries worldwide.

Details on the JVS and other journal "declarations of independence"

* Wolters Kluwer has announced its intention to sell Kluwer Academic Publishers, its scientific publishing division.  It could go for 400-500 million Euro.  Stock analysts welcomed the announcement.  (PS:  Why was this good news to stock analysts?  Is it because the scientific publications were losing money?  If so, is the reason related to the rising competition from free and affordable journals?  See the JVS story above.)
(Thanks to LibLicense.)

Kluwer's press release (PS:  The public reason for the sale is to allow Wolters to focus on medical publications.  But it doesn't seem that profitable scientific publications would be a distraction from this focus.)

* The International Consortium for the Advancement of Academic Publishing (ICAAP) has announced its prices for designing, managing, preparing and hosting electronic journals.  After the initial setup fee, the price is $400/year, and $30/article for markup (Canadian dollars).  Compare these prices, for example, to services charging $500/article.  ICAAP believes its prices are low enough to create an alternative to expensive commercial presses and to allow editors to make their journals free or affordable for readers.  ICAAP also announces that it now offers similar services for print journals.
(Full disclosure:  I'm on the ICAAP board, but ICAAP is a non-profit organization and I have no financial interest in it.)

* Next month _Science_ magazine plans to publish the genome sequence for rice, as deciphered by Syngenta.  But the journal will not require that the underlying data be publicly accessible.  This has triggered a letter of protest from 20 prominent genome scientists, including two Nobel laureates.  Their letter argues that "accepted norms of the field" require data to be accessible in public domain databases.  The concern is that the genetic data on rice, the most important food plant in the developing world, will be privately owned.  Quoting Alex Wijeratna for ActionAid:  "The corporations are leading a charge to privatise the staple crops.  There could be serious implications for poor farmers in developing countries."

* DNA patents are difficult to obtain and only last 17 years.  But what if a DNA sequence were coded as music and then copyrighted?  Not only are copyrights easier to obtain than patents (they are essentially automatic), thanks to Mickey Mouse they can last more than five times longer than patents.  Maxygen is exploring the possibility of locking up its genomic discoveries in copyrighted music, distributing them as MP3 files, and selling software to convert the music files back to readable data.  Facts cannot be copyrighted, so any independent discoverer of the same DNA sequence would be free to exploit it.  But the music files could be copyrighted and to that extent protected against illegal copying during distribution.  Purchasers of the decoding software would have an interest in keeping the decoded data to themselves.  (PS:  Of course all data are convertible to music, just as all are convertible to texts or images.  While the Maxygen plan seems to be about evasive legal scheming at its worst, it points out a way that algorithms can collapse an important part of the distinction between patents and copyrights.  I suspect it will therefore trigger a change in the law.  But I worry whenever legislators nowadays start revising IP law.)

* The 2002 EPPIE awards for (commercial) electronic publishing have been announced.

* Ingenta has been named one of the 100 most visionary companies in the UK.

* Wisconsin has repealed a 130 year old statute requiring public libraries in the state to offer their services to patrons free of charge.  It doesn't require libraries to charge fees, but it allows them to do so if they wish.
(Thanks to LIS News.)


New on the net

* David Rumsey, a private collector, is putting his collection of 150,000 rare 18th and 19th century maps online for all to use free of charge.
(Thanks to El.pub Weekly.)

* Oxford Reference Online has now launched.  It will eventually contain 100 of Oxford's full-text reference texts (over 130 million words), but has only a portion of that online today.  Individual users must pay $250/year for access, the same price as schools.
(Thanks to Research Buzz.)

Oxford Reference Online

* OCLC has released SiteSearch, open source software to manage distributed library content on the web.  The Java code is now available for downloading, for non-commercial users only.

* The UK's ubiquitous Joint Information Systems Committee (JISC) has launched an online magazine, _JISC inform_, to help us keep track of its many FOS projects and other initiatives.  The new magazine replaces _JISC News_.  The first issue contains stories on RDN, digital preservation, and the research grid.  (Separate stories do not have separate URLs.)


Share your thoughts

* The Resources Discovery Network has created an online survey on the features users would like to see in academic subject portals.  It will accept responses until April 12.
(Thanks to the Manchester Metropolitan University Library.)

* The Senate Judiciary Committee held a hearing last week on the SSSCA (FOSN for 3/18/02), now called the CBDTPA.  Its web site has a form for citizen comments on the issues.  A 3/18 posting to LibLicense pointed out that none of the comments raised library or fair-use issues.  The number of comments has grown significantly since then, but the observation still seems to hold.  Moreover, I haven't seen any comment raise scientific or scholarly research issues.  Let the committee know your thoughts.

* If you would like to comment on President Bush's plan to secure cyberspace, then take the government-commissioned survey created by the SANS Institute.  Responses will be accepted until April 20.
(Thanks to C-FIT.)


In other publications

* In the March 22 _News.com_, Lisa Bowman interviews Joe Kraus, founder of the anti-DMCA, anti-SSSCA, anti-CBDTPA DigitalConsumer.com (see FOSN for 3/18/02).  Quoting Kraus:  "I happen to believe that we are entering a world where the personal-use rights [to make certain kinds of copy] that consumers have are being taken away by media companies under the guise of preventing illegal copying, but [in] reality [companies are] trying to establish new business models."

* In the March 21 _SearchDay_, Chris Sherman evaluates FindArticles, a free database of full-text articles from journals and magazines.  What sets it apart is that it offers free online access to articles that may not be available from the home sites of the original journals and magazines.  (PS:  This is a great advantage, but it's not clear how FindArticles does it.  If it has licensed the content for free distribution from the original periodical, then why doesn't the original periodical offer free access to the same content?)

* In the March 21 _Free Pint_, Paul Harwood asks whether scholarly publishing is undergoing evolution or revolution.  He outlines the interests of publishers, subscription agents, and librarians, and describes a good number of the recent FOS initiatives.  Although Harwood is a Regional Director for Swets Blackwell, he admits that he finds Stevan Harnad's FOS arguments persuasive.  In the end he concludes that deep change is occurring, and that "a revolution cannot be discounted".
(Thanks to Gary Price's VASND.)

* In the March issue of _Syllabus_, I have an article on Noesis, software for searching and organizing online content that I am developing with a partner.  The last section of the article shows how the software will serve the FOS movement.

* The March issue of _D-Lib Magazine_ is now online.  The theme for this issue is Digital Technologies and Indigenous Communities.  In addition to the theme articles, FOSN readers will be interested in the following short notes.

Y. Kathy Kwan, LinkOut --Explore beyond PubMed and Entrez

Kat Hagedorn, Launch of OAIster Project

Christine Lafon, Physicists Gain Online Research Tool That Will Save Thousands of Hours Yearly [namely, the NASA Astrophysics Data System]

* The first 2002 issue of the _Journal of Information Law and Technology_ is now online.  The following articles might be of interest to FOSN readers.

Debra Tuomey, Weathering the Commercial Storm:  Why Everyone Should Steer Clear of UCITA

Fernando Galindo, A Code of Practice for the Globalisation of Electronic Commerce and Government

Lee Marshall has a long letter to the editor endorsing the WIPOUT alternate essay contest (see FOSN for 9/6/01, 1/30/02)

* In a recent but undated article in _AtNewYork_, Erin Joyce reports on the state of the debate whether online newspapers, magazines, greeting cards, and other non-academic content should be free or priced.  This would not normally be relevant to FOS.  But the article includes this obviously true, and transferable, quotation from Charlie Fink, president of AmericanGreetings.com:  "One of the metrics that's going to judge [sic] whether or not your business can charge for content is how free are the alternatives.  The free alternatives in your category are going to dramatically affect your ability to control pricing."


Following up

To see past coverage of these stories in FOSN, use the search engine at the FOSN archive.

* More on the DMCA

AOL was acquitted of copyright infringement for hosting illegally copied novels of Harlan Ellison.  If you think of the DMCA as the draconian copyright statute, this may be a surprise.  But the DMCA explicitly releases ISPs from liability for hosting illegal content if they remove it when notified of its existence.  (See the Google story above.)  The good news is that this provision was upheld in court.  The bad news is that a zealous prosecutor hauled an ISP into court despite the statute.

Doug Isenberg on the DMCA exemption for ISPs

* More on the SSSCA (now CBDTPA)

Senator Fritz Hollings has finally introduced the bill, now with a new name, the Consumer Broadband and Digital Television Promotion Act (CBDTPA)

Hollings' public statement on introducing the bill (arguing for it and explaining how it differs from the previous draft)

Text of the CBDTPA

Section by section summary of the CBDTPA

EFF's page on the SSSCA/CBDTPA

Statements by Jack Valenti and Hilary Rosen in support of the CBDTPA

Statments by various opponents of the CBDTPA

The _Christian Science Monitor_ has an anti-SSSCA/CBDTPA editorial in the March 19 issue.
("Copyright protection is a legitimate concern...But if mechanisms are built into digital devices to restrict copying, they could easily interfere with individuals' legitimate and full-range use of their equipment.")

A researcher at Microsoft Research has concluded that copyright security requires hardware protection, not just software protection.  While this supports the premise of the CBDTPA, it doesn't follow that the CBDTPA should be adopted.  Quoting Bruce Schneier of Counterpane Internet Security:  "If the only thing you want to do in your life is protect the content of the record companies and Hollywood, then the [CBDTPA] is a great thing.  If you put everybody in a box and locked them all in, then you wouldn't have murder either....For the entertainment industry to put this forward just shows how much of the economy they are willing to sacrifice for their ends."

To send a FAX to your Congressional delegation opposing the CBDTPA (thanks to DigitalConsumer)

* More on the deletion of web content to keep it from terrorists and citizens

Recent purges

* More on the analogy of the FOS revolution to the 1840 mail revolution

In 1840, nations began adopting postage stamps and the "sender pays" rule for mail, lifting the cost from recipients.  I just learned that Sonia Arrison has been advocating for some time that we curb spam by charging email users for the privilege of sending email to people they don't know.  That would make email funding similar to p-mail funding.  Should we go there?


Catching up (old news I should have discovered earlier)

* The full text of the anthology of essays, _The Transition from Paper:  Where Are We Going and How Will We Get There?_ (ed. R. Stephen Berry and Ann Simon Moffat) is online without charge.  Many of the papers have a strong FOS connection.  See especially those by Andrew Odlyzko, Steven Bachrach, Paul Ginsparg, Ann Okerson, Martin Blume, and R. Stephen Berry.
(Thanks to LibLicense.)

* CyberCemetery is an free online archive of content from defunct government web sites.  This is all "public grade" information, not a back-up of content deleted to keep it from terrorists.
(Thanks to LII Week.)



If you plan to attend one of the following conferences, please share your observations with us through our discussion forum.

* Association of Information and Dissemination Centers (ASDIC) Spring 2002 Meeting
St. Augustine, Florida, March 24-26

* OCLC Institute. Steering by Standards.  (A series of satellite videoconferences.)
Cyberspace.  OAI, March 26.  OAIS, April 19.  Metadata standards in the future, May 29.

* WebSearch University
San Francisco, March 25-26; Stamford CT, April 30 - May 1; Washington DC, September 23-24; Chicago, Octeober 22-23; Dallas, November 19-20.

* European Colloquium on Information Retrieval Research
Glasgow, March 25-27

* e-Content:  Discovering and Delivering Value
Toronto, March 25-27

* New Developments in Digital Libraries
Ciudad Real, Spain, April 2-3

* The New Information Order and the Future of the Archive
Edinburgh, March 20-23

* Copyright Management in Higher Education:  Ownership, Access and Control
Adelphi, Maryland, April 4-5

* Global Knowledge Partnership Annual Meeting
Addis Ababa, April 4-5

* What Scholars Need to Know to Publish Today:  Digital Writing and Access for Readers
Albany, New York, April 8

* International Conference on Information Technology: Coding and Computing
Las Vegas, April 8-10

* NetLab and Friends:  10 Years of Digital Library Development
Lund, April 10-12

* E-Content 2002 (on ebooks)
London, April 11

* Censorship and Free Access to Information in Libraries and on the Internet
Copenhagen, April 11

* International Learned Journals Seminar:  We Can't Go On Like This:  The Future of Journals
London, April 12

* SIAM International Conference on Data Mining
Arlington, Virginia, April 11-13

* Creating access to information:  EBLIDA workshop on getting a better deal from your information licences
The Hague, April 12

* Licensing Electronic Resources to Libraries
Philadelphia, April 15

* United Kingdom Serials Group Annual Conference and Exhibition
University of Warwick, April 15- 17

* Conference on Computers, Freedom, and Privacy
San Francisco, April 16-19

* EDUCAUSE Networking 2002
Washington, D.C., April 17-18

* Museums and the Web 2002
Boston, April 17-20

* Legal Guidelines for Use of Intellectual Property in Higher Education
Oneonta, NY, April 19

* Information, Knowledges and Society: Challenges of A New Era
Havana, April 22-26

* DAI Institute on The State of Digital Preservation:  An International Perspective
Washington, D.C., April 24-25

* CLIR Sponsors' Symposium:  New Challenges, New Solutions:  Libraries for the Future
Washington, D.C., April 26

* The European Library:  The Gate to Europe's Knowledge:  Milestone Conference
Frankfurt am Main, April 29-30


The Free Online Scholarship Newsletter is supported by a grant from the Open Society Institute.


This is the Free Online Scholarship Newsletter (ISSN 1535-7848).

Please feel free to forward any issue of the newsletter to interested colleagues.  If you are reading a forwarded copy of this issue, you may subscribe by signing up at the FOS home page.

FOS home page, general information, subscriptions, editorial position

FOS Newsletter, subscriptions, back issues

FOS Discussion Forum, subscriptions, postings

Guide to the FOS Movement

Sources for the FOS Newsletter

Peter Suber

Copyright (c) 2002, Peter Suber

Return to the Newsletter archive