Publication: Fast Manhattan sketches in data streams
Date
Authors
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The ℓ1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in R n is Pn |xi − yi|. i=1 Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O ∗ (ε −2) space and O ∗ (1) update time. The O ∗ notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(ε −3) space or Ω(ε −2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O ∗ (1) factors.