Show simple item record

dc.contributor.authorNelken, Rani
dc.contributor.authorYamangil, Elif
dc.date.accessioned2013-11-20T18:50:29Z
dc.date.issued2008
dc.identifierQuick submit: 2013-09-10T20:03:31-04:00
dc.identifier.citationNelken, Rani and Elif Yamangil. 2008. "Mining Wikipedia's article revision history for training computational linguistics algorithms." In Wikipedia and Artificial Intelligence: An Evolving Synergy: Papers from the 2008 AAAI Workshop (Chicago, Illinois, 13 July 13, 2008), ed. Razvan Bunescu. Technical Report WS-08-15, pp 31-36. Menlo Park, CA: AAAI Press.en_US
dc.identifier.isbn978-1-57735-383-6en_US
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:11326227
dc.description.abstractWe present a novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia's article revision history. By comparing adjacent versions of the same article, we extract voluminous training data for tasks for which data is usually scarce or costly to obtain. We illustrate this paradigm by applying it to three separate text processing tasks at various levels of linguistic granularity. We first apply this approach to the collection of textual errors and their correction, focusing on the specific type of lexical errors known as "eggcorns''. Second, moving up to the sentential level, we show how to mine Wikipedia revisions for training sentence compression algorithms. By dramatically increasing the size of the available training data, we are able to create more discerning lexicalized models, providing improved compression results. Finally, moving up to the document level, we present some preliminary ideas on how to use the Wikipedia data to bootstrap text summarization systems. We propose to use a sentence's persistence throughout a document's evolution as an indicator of its fitness as part of an extractive summary.en_US
dc.description.sponsorshipEngineering and Applied Sciencesen_US
dc.language.isoen_USen_US
dc.publisherAAAI Pressen_US
dc.relation.isversionofhttp://www.aaai.org/Library/Workshops/2008/ws08-15-006.phpen_US
dash.licenseLAA
dc.titleMining Wikipedia's Article Revision History for Training Computational Linguistics Algorithmsen_US
dc.typeConference Paperen_US
dc.date.updated2013-09-11T00:04:08Z
dc.description.versionAccepted Manuscripten_US
dc.rights.holderRani Nelken, Elif Yamangil
dash.depositing.authorYamangil, Elif
dc.date.available2013-11-20T18:50:29Z
dc.relation.bookWikipedia and Artificial Intelligence: An Evolving Synergy: Papers from the 2008 AAAI Workshopen_US
workflow.legacycommentsLikely rights to post this manuscript; deciding to commit. ~RLCen_US
dash.contributor.affiliatedYamangil, Elif


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record