What is it about?
Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee. The insight of the LCCS search framework is that close data objects have a longer LCCS than the far-apart ones with high probability. The LCCS-LSH scheme is LSH-family-independent, and it supports c-ANNS with different kinds of distance metrics. We also introduce a multi-probe version of LCCS-LSH named MP-LCCS-LSH to save memory. We study the performance of LCCS-LSH and MP-LCCS-LSH over five real-life large-scale datasets under two famous distance metrics: Euclidean distance and Angular distance. Extensive experiments show that LCCS-LSH outperforms several state-of-the-art LSH schemes, especially under the Angular distance. Code is available at https://github.com/HuangQiang/lccs-lsh.
Featured Image
Photo by Suraj Patil on Unsplash
Why is it important?
The popular LSH schemes such as E2LSH, FALCONN, C2LSH, and QALSH often use the static concatenating search framework or the dynamic collision counting framework to determine the nearest neighbours for the queries. In this paper, we define a new concept of LCCS and propose a new k-LCCS search framework that leverages the circular arrays to reuse the LSH functions, which makes the LCCS-LSH more efficient and accurate under the same number of LSH functions budget. We also design a new data structure named Circular Shift Array (CSA) for k-LCCS search. CSA is potential of the separate interest for other fields of computer science.
Perspectives
Read the Original
This page is a summary of: Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring, May 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3318464.3389778.
You can read the full text:
Resources
Source Code and Data for LCCS-LSH
If you are interested in our paper about LCCS-LSH, you could download the source code and the datasets from our GitHub.
Slides for LCCS-LSH
The slides can be found on my homepage, which is made by Dr Lei Yifan. The long version of LCCS-LSH (with more proof details and more experiments) can be found from the arXiv link on my homepage.
Contributors
The following have contributed to this page