What is it about?

Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee. The insight of the LCCS search framework is that close data objects have a longer LCCS than the far-apart ones with high probability. The LCCS-LSH scheme is LSH-family-independent, and it supports c-ANNS with different kinds of distance metrics. We also introduce a multi-probe version of LCCS-LSH named MP-LCCS-LSH to save memory. We study the performance of LCCS-LSH and MP-LCCS-LSH over five real-life large-scale datasets under two famous distance metrics: Euclidean distance and Angular distance. Extensive experiments show that LCCS-LSH outperforms several state-of-the-art LSH schemes, especially under the Angular distance. Code is available at https://github.com/HuangQiang/lccs-lsh.

Featured Image

Why is it important?

The popular LSH schemes such as E2LSH, FALCONN, C2LSH, and QALSH often use the static concatenating search framework or the dynamic collision counting framework to determine the nearest neighbours for the queries. In this paper, we define a new concept of LCCS and propose a new k-LCCS search framework that leverages the circular arrays to reuse the LSH functions, which makes the LCCS-LSH more efficient and accurate under the same number of LSH functions budget. We also design a new data structure named Circular Shift Array (CSA) for k-LCCS search. CSA is potential of the separate interest for other fields of computer science.

Perspectives

I hope this paper could bring new thought to people who design basic data structures for NNS problem or other fundamental similarity search problems.

Dr Qiang Huang
National University of Singapore

Read the Original

This page is a summary of: Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring, May 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3318464.3389778.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page