Publication:

Fast Manhattan sketches in data streams

Loading...
Thumbnail Image

Date

2010

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

ACM
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Nelson, Jelani, and David P. Woodruff. 2010. "Fast Manhattan Sketches in Data Streams." Proceedings of the 29th Annual ACM Symposium on Principles of Database Systems (PODS), 2010. doi:10.1145/1807085.1807101

Abstract

The ℓ1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in R n is Pn |xi − yi|. i=1 Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O ∗ (ε −2) space and O ∗ (1) update time. The O ∗ notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(ε −3) space or Ω(ε −2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O ∗ (1) factors.

Description

Other Available Sources

Research Data

Keywords

streaming, sketching, clustering, data mining

Terms of Use

Metadata Only

Endorsement

Review

Supplemented By

Related Stories