Publication: Self-Identifying Data for Fair Use
Open/View Files
Date
2015
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Association for Computing Machinery (ACM)
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Chong, Stephen, Christian Skalka, and Jeffrey A. Vaughan. 2015. “Self-Identifying Data for Fair Use.” J. Data and Information Quality 5 (3) (March 2): 1–30. doi:10.1145/2687422.
Research Data
Abstract
Public-use earth science datasets are a useful resource with the unfortunate feature that their provenance is easily disconnected from their content. “Fair-use policies” typically associated with these datasets require appropriate attribution of providers by users, but sound and complete attribution is difficult if provenance information is lost. To address this we introduce a technique to directly associate provenance information with sensor datasets. Our technique is similar to traditional watermarking but is intended for application to unstructured time-series datasets. Our approach is potentially imperceptible given sufficient margins of error in datasets, and is robust to a number of benign but likely transformations including truncation, rounding, bit-flipping, sampling, and reordering. We provide algorithms for both one-bit and blind mark checking, and show how our system can be adapted to various data representation types. Our algorithms are probabilistic in nature and are characterized by both combinatorial and empirical analyses. Mark embedding can be applied at any point in the data lifecycle, allowing adaptation of our scheme to social or scientific concerns.
Description
Other Available Sources
Keywords
Provenance, Self-identifying data
Terms of Use
This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service