Addressing Underspecified Lineage Queries on Provenance

View/ Open
Metadata
Show full item recordCitation
Margo, Daniel, Peter Macko, and Margo Seltzer. Addressing Underspecified Lineage Queries on Provenance. Harvard Computer Science Group Technical Report TR-01-12.Abstract
State-of-the-art provenance systems accumulate data over time, creating deep lineage trees. When queried for the lineage of an object, these systems can return excessive results due to the longevity and depth of their provenance. Such a query is underspecified: it does not constrain its result to a finite span of history. Unfortunately, specifying queries correctly often requires in-depth knowledge of the data set. We address the problem of underspecified lineage queries on provenance with techniques inspired by Web search. We present two metrics, SubRank and ProvRank, that measure the frequency of a particular result across the space of all possible lineage queries. We then use these metrics to define a subset of the lineage with which to respond to a query. These metric-defined result sets closely approximate a user’s conceptual view of relevant history. We evaluate our techniques on diverse workflows ranging from Wikipedia revision data to fMRI processing.Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:23017257
Collections
- FAS Scholarly Articles [17845]
Contact administrator regarding this item (to report mistakes or request changes)