Skip to content

sjyouuuuug/filterbenchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Filtered Approximate Nearest Neighbor Search: A Unified Benchmark and Systematic Experimental Study [Experiment, Analysis & Benchmark]

【Doing】We are continually cleaning our codebases and improving documentation.

For our main datasets, please refer to Hugging Face.

Since the dataset is quite large, uploading the complete dataset takes a lot of time. However, once the dataset has been organized, we will make it directly available on Hugging Face.

UNG-dev & Brute Force

build

cd UNG-dev/

mkdir build

cd build

cmake -DCMAKE_BUILD_TYPE=Release ../codes/

make -j

Usage

Each dataset is available in two raw data formats: dataset_name.fvecs and dataset_name.bin. We offer these options to ensure compatibility with a variety of tools and workflows. If you need to convert the data, please consult the detailed instructions in UNG-dev/README.md. Also, refer to https://github.com/YZ-Cai/Unified-Navigating-Graph for detailed environment setup.

We provide a small dataset on github to facilitate testing and experimentation.

Brute Force

For containment scenario:

./build/tools/compute_groundtruth \
    --data_type float --dist_fn L2 \
    --base_bin_file ../data/words/words_base.bin --query_bin_file ../data/words/words_query_and.bin \
    --base_label_file ../data/words/label_base.txt --query_label_file ../data/t/words_query_and.txt \
    --gt_file ../data/words/words_gt_and.bin --scenario containment --K 10 --num_threads 16

For overlap scenario:

./build/tools/compute_groundtruth \
    --data_type float --dist_fn L2 \
    --base_bin_file ../data/words/words_base.bin --query_bin_file ../data/words/words_query_or.bin \
    --base_label_file ../data/words/label_base.txt --query_label_file ../data/words/words_query_or.txt \
    --gt_file ../data/words/words_gt_or.bin --scenario overlap --K 10 --num_threads 16

The result will be saved in the specified ground truth files (words_gt_and.bin and words_gt_or.bin) and the statistics will be printed to the console.

UNG Parameter Tuning Workflow

To facilitate parameter tuning for the UNG model, we provide a suite of scripts located in the bash/ directory. This workflow is designed to be straightforward and can be adapted for other models with minor modifications.

  1. Step 1: Explore the Parameter Space

This script systematically traverses the defined parameter space for UNG, building a small index for each parameter combination to enable initial analysis.

cd bash
python3 ./traverse_param_space.py
  1. Step 2: Perform Subspace Search

This script executes a hybrid search across the parameter subspace generated in the previous step. You can customize the search scenarios by modifying the search_in_subspace.py file.

cd bash
python3 ./search_in_subspace.py
  1. Step 3: Aggregate Search Results

This script combines the results from the various scenario searches for each parameter set and calculates the overall performance metrics.

cd bash
python3 ./combine_search_result.py
  1. Step 4: Select Representative Samples

From the aggregated results, this script identifies and selects a set of representative samples from each subspace. The final selection is saved to a CSV file for review.

cd bash
python3 ./select_representative.py
  1. Step 5: Build Representative Graphs

Using the samples selected in the previous step, this script constructs the final representative graphs.

cd bash
python3 ./build_representative_graphs.py
  1. Step 6: Run Customized Evaluations
python search_basic_exp.py
python search_base_exp.py
python search_NHQ_exp.py
python search_selectivity_exp.py
# ... and other evaluation scripts

Post-filter HNSW/IVFPQ

The post-filter HNSW and post-filter IVFPQ

build

CAPS

build

Please refer to CAPS/README.md to generate libfaiss.a and OpenBLAS/libopenblas.a

make index
make query

We only modified the query part to support multi-thread querying, make sure to use our modified version and install openmp.

Usage

python3 ./bash/build_index.py
python3 ./bash/search_index.py

ACORN-1 and ACORN-gamma

build

Our code is primarily concentrated in three files: build_gamma_index, search_acorn_index, and search_acorn_index_parse.

cd ACORN/

cmake -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -B build

make -C build -j faiss
make -C build utils # ignoring error of undifined reference to main
make -C build build_gamma_index # build ACORN-gamma and ACORN-1 index
make -C build search_acorn_index # for ACORN-1
make -C build search_acorn_index_parse # for ACORN-gamma

Usage

All scripts are located in the bash\ folder. The functionality and name of the scripts is almost the same as described in the UNG part. The only difference is the suffixes "_1" and "_gamma" which indicate whether it's ACORN-1 or ACORN-gamma.

If you want to run on your own datasets, convert the ground-truth files from .bin to .txt before running the scripts. Use the following command and update the paths in the script accordingly:

python3 ./bash/convert_gt.py

About

code base for Filtered Approximate Nearest Neighbor Search: A Systematic Survey and Benchmark: [Experiment, Analysis \& Benchmark]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors