TY - GEN
T1 - Locality Sensitive Hashing revisited
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
AU - Wang, Hongya
AU - Cao, Jiao
AU - Shu, Lih Chyun
AU - Rafiei, Davood
PY - 2013
Y1 - 2013
N2 - Locality Sensitive Hashing (LSH) is widely recognized as one of the most promising approaches to similarity search in high-dimensional spaces. Based on LSH, a considerable number of nearest neighbor search algorithms have been proposed in the past, with some of them having been used in many real-life applications. Apart from their demonstrated superior performance in practice, the popularity of the LSH algorithms is mainly due to their provable performance bounds on query cost, space consumption and failure probability. In this paper, we show that a surprising gap exists between the LSH theory and widely practiced algorithm analysis techniques. In particular, we discover that a critical assumption made in the classical LSH algorithm analysis does not hold in practice, which suggests that using the existing methods to analyze the performance of practical LSH algorithms is a conceptual mismatch. To address this problem, a novel analysis model is developed that bridges the gap between the LSH theory and the method for analyzing the LSH algorithm performance. With the help of this model, we identify some important flaws in the commonly used analysis methods in the LSH literature. The validity of this model is verified through extensive experiments with real datasets.
AB - Locality Sensitive Hashing (LSH) is widely recognized as one of the most promising approaches to similarity search in high-dimensional spaces. Based on LSH, a considerable number of nearest neighbor search algorithms have been proposed in the past, with some of them having been used in many real-life applications. Apart from their demonstrated superior performance in practice, the popularity of the LSH algorithms is mainly due to their provable performance bounds on query cost, space consumption and failure probability. In this paper, we show that a surprising gap exists between the LSH theory and widely practiced algorithm analysis techniques. In particular, we discover that a critical assumption made in the classical LSH algorithm analysis does not hold in practice, which suggests that using the existing methods to analyze the performance of practical LSH algorithms is a conceptual mismatch. To address this problem, a novel analysis model is developed that bridges the gap between the LSH theory and the method for analyzing the LSH algorithm performance. With the help of this model, we identify some important flaws in the commonly used analysis methods in the LSH literature. The validity of this model is verified through extensive experiments with real datasets.
UR - http://www.scopus.com/inward/record.url?scp=84889589639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889589639&partnerID=8YFLogxK
U2 - 10.1145/2505515.2505765
DO - 10.1145/2505515.2505765
M3 - Conference contribution
AN - SCOPUS:84889589639
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1969
EP - 1978
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Y2 - 27 October 2013 through 1 November 2013
ER -