Optimal Las Vegas Approximate Near Neighbors in L_p
CitationWei, Alexander. 2020. Optimal Las Vegas Approximate Near Neighbors in L_p. Bachelor's thesis, Harvard College.
AbstractIn the approximate near neighbor search problem, one is given a database D of n points and asked to construct a data structure which, given a query point q, returns a "near" neighbor of q in D. As a fundamental algorithmic primitive for processing high-dimensional data, approximate near neighbor search has found a wide variety of applications, both within computer science and beyond, including to problems in machine learning, computer vision, data mining, and genomics.
We show that high-dimensional approximate near neighbor search can be solved without false negatives while matching the performance of the state-of-the-art algorithms (which can produce false negatives). Our results hold for approximate near neighbor search for Euclidean space as well as all l_p spaces with 1 <= p < 2. We achieve this result in two ways: In the data-independent setting, in which algorithms do not adapt to the dataset, we show that the optimal ball-carving LSH of Andoni and Indyk (FOCS 2006) can be adapted into an algorithm without false negatives. In the data-dependent setting, in which algorithms can adapt to structure found in the dataset, we modify the data structure of Andoni et al. (SODA 2017) to produce no false negatives by constructing a new "Las Vegas" locality-sensitive filter family for the unit sphere.
Our data-independent construction improves on the recent Las Vegas data structure of Ahle (FOCS 2017) for l_p when 1 < p <= 2. Our data-dependent construction does even better for l_p for all p in [1, 2] and is the first Las Vegas approximate near neighbors data structure to make use of data-dependent approaches. We also answer open questions of Indyk (SODA 2000), Pagh (SODA 2016), and Ahle by showing that for approximate near neighbors, Las Vegas data structures can match state-of-the-art Monte Carlo data structures in performance for both the data-independent and data-dependent settings and across space-time tradeoffs.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364767
- FAS Theses and Dissertations