Authors:
Meng Chen、Kai Zhang、Zhenying He、Yinan Jing、X. Sean Wang
Paper:
https://arxiv.org/abs/2408.08933
Introduction
Approximate Nearest Neighbor Search (ANNS) is a critical component in various applications, such as recommendation systems and large language model-based applications. With the rise of multimodal neural models, cross-modal ANNS has become essential for retrieving similar items across different modalities (e.g., using text to find similar images). However, existing ANNS approaches struggle with cross-modal queries due to the inherent distribution gap between embeddings from different modalities. This paper introduces RoarGraph, a projected bipartite graph designed to address these inefficiencies and significantly improve cross-modal ANNS performance.
Related Work
Background on ANNS
ANNS aims to find the approximate nearest neighbors in a dataset for a given query, trading off between search speed and accuracy. Traditional methods include partition-based, quantization-based, and hashing-based approaches, with graph-based methods representing the state-of-the-art performance.
Out-of-Distribution ANNS
Cross-modal ANNS queries are often out-of-distribution (OOD) relative to the base data, leading to inefficiencies in existing ANNS methods. The distribution gap between different modalities causes OOD queries to deviate significantly from the base data, making it challenging for traditional ANNS indexes to perform efficiently.
Limitations of Existing Approaches
Existing ANNS methods, such as HNSW and IVF, are designed for single-modal scenarios and perform poorly with OOD queries. RobustVamana, a graph index designed for OOD-ANNS, improves performance but still falls short in handling the distribution gap effectively.
Research Methodology
Key Insights from OOD Workloads
The paper identifies that OOD queries are spatially distant from the base data, and their k-nearest neighbors are also widely distributed. This breaks the assumptions of existing ANNS approaches, which expect queries to be close to the base data and their nearest neighbors to be clustered.
Proposed Solution: RoarGraph
RoarGraph is designed to map distributed vectors that are nearest neighbors to queries into closely connected neighbors within a graph index. The construction process involves:
1. Building a bipartite graph by mapping the similarity relationship between queries and base data.
2. Projecting the bipartite graph onto the base data using neighborhood-aware projection.
3. Enhancing the graph’s connectivity to ensure reachability and efficient search paths.
Experimental Design
Datasets
The experiments use three modern cross-modal datasets: Text-to-Image, LAION, and WebVid. These datasets include text, image, and video frame embeddings, providing a comprehensive evaluation of RoarGraph’s performance.
Evaluation Metrics
The performance is evaluated using recall@k and queries per second (QPS). Recall measures the accuracy of retrieval, while QPS assesses the search speed.
Baseline Methods
The paper compares RoarGraph with state-of-the-art graph-based ANNS methods, including HNSW, NSG, τ-MNG, and RobustVamana.
Results and Analysis
Search Speed vs. Recall
RoarGraph consistently outperforms existing methods across all datasets, achieving up to 3.56× faster search speed at a 90% recall rate for OOD queries. This demonstrates RoarGraph’s efficiency in handling cross-modal ANNS tasks.
Routing Hops vs. Recall
RoarGraph requires significantly fewer hops during the search, reducing the length of the search path and improving efficiency.
Ablation Study
The study validates the effectiveness of RoarGraph’s components, showing that the projected graph and connectivity enhancement techniques contribute to its superior performance.
Effects of Query Set Size for Indexing
RoarGraph maintains high performance even with a reduced query set size, making it flexible and practical for various scenarios.
Robustness to In-Distribution Query
RoarGraph also performs well with in-distribution queries, ensuring its applicability to different query types.
Index Size and Construction Overhead
RoarGraph’s index size is comparable to other methods, and its construction time is reasonable, especially when using a smaller query set.
Overall Conclusion
RoarGraph addresses the inefficiencies of existing ANNS approaches in handling cross-modal queries by leveraging query distribution to guide graph construction. Extensive experiments demonstrate its superior performance, making it a valuable solution for cross-modal ANNS tasks. The proposed method not only improves search speed and accuracy but also maintains robustness to different query types, offering a flexible and practical approach for real-world applications.