fix(sindi): fix error when neither doc nor query is relevant (#1589) by inabao · Pull Request #1609 · antgroup/vsag

inabao · 2026-02-27T04:09:50Z

cp #1589 to 0.16
link #1581

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>

gemini-code-assist · 2026-02-27T04:10:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the SINDI algorithm's sparse term data handling, specifically addressing an error that occurred when neither the document nor the query yielded relevant results. The changes introduce a more robust mechanism for tracking the total number of inserted vectors and refine the distance filtering logic, ensuring that the KnnSearch operation and its serialization behave correctly under various relevance conditions.

Highlights

Corrected Distance Filtering Logic: Modified the SparseTermDataCell::InsertHeapByDists method to change the condition for considering a distance relevant from dists[id] != 0 to dists[id] < 0. This change specifically addresses an error scenario where neither the document nor the query was relevant, ensuring that only truly relevant distances are processed.
Introduced Total Vector Count: Added a new total_count_ member variable to the SparseTermDataCell class and ensured it is incremented every time a vector is inserted. This provides an accurate count of all vectors processed by the cell.
Improved Loop Iteration Accuracy: Updated the iteration bounds in SparseTermDataCell::InsertHeapByDists from dists_size to the newly introduced total_count_, ensuring that loops correctly process all relevant elements.
Adjusted Test Case Validation: Modified a test case in src/algorithm/sindi/sindi_test.cpp to use another_result->GetDim() instead of a fixed k value when iterating through search results, improving the robustness and accuracy of test validation after serialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/algorithm/sindi/sindi_test.cpp
- Updated the loop condition in the KnnSearch serialization test to use another_result->GetDim() for accurate iteration.
src/data_cell/sparse_term_datacell.cpp
- Modified the distance relevance check from dists[id] != 0 to dists[id] < 0 within InsertHeapByDists.
- Replaced dists_size with total_count_ in iteration loops within InsertHeapByDists.
- Added an increment to total_count_ when a vector is inserted into the SparseTermDataCell.
src/data_cell/sparse_term_datacell.h
- Declared a new int64_t total_count_ member variable to track the number of inserted vectors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request effectively addresses an error scenario in the SINDI algorithm where search results might be incorrect or lead to out-of-bounds access. The changes correctly adjust loop bounds in SparseTermDataCell::InsertHeapByDists to use total_count_ instead of dists_size, ensuring iteration over the actual number of elements. The logic for identifying valid distances has been refined from dists[id] != 0 to dists[id] < 0, which aligns with the typical representation of inner product distances. A new total_count_ member variable was introduced and is properly maintained, contributing to the overall correctness of the index. The test case was also updated to reflect these corrections, improving the robustness of the testing framework.

gemini-code-assist · 2026-02-27T04:11:43Z

src/data_cell/sparse_term_datacell.cpp

        if (heap.size() < n_candidate) {
-            for (; id < dists_size; id++) {
-                if (dists[id] != 0) {
+            for (; id < total_count_; id++) {


The change from dists_size to total_count_ is a critical fix. Using dists_size could lead to iterating over uninitialized memory or incorrect data if dists_size is larger than the actual number of elements, potentially causing crashes or incorrect search results. total_count_ accurately reflects the number of valid elements.

gemini-code-assist · 2026-02-27T04:11:43Z

src/algorithm/sindi/sindi_test.cpp

        // test serialize
        auto another_result = another_index->KnnSearch(query, k, search_param_str, nullptr);
-        for (int j = 0; j < k; j++) {
+        for (int j = 0; j < another_result->GetDim(); j++) {


Changing the loop condition from j < k to j < another_result->GetDim() is a crucial correction. This prevents potential out-of-bounds access if another_result->GetDim() returns fewer elements than k, ensuring the test accurately reflects the actual number of results returned by the search.

gemini-code-assist · 2026-02-27T04:11:43Z

src/data_cell/sparse_term_datacell.cpp

-            for (; id < dists_size; id++) {
-                if (dists[id] != 0) {
+            for (; id < total_count_; id++) {
+                if (dists[id] < 0) {


Changing the condition from dists[id] != 0 to dists[id] < 0 is a significant logical correction. Given that inner product distances are often stored as negative values (e.g., dist_ip = -1 * query * base), checking for dists[id] < 0 correctly identifies valid, processed distances, preventing false positives or negatives in the heap insertion logic.

gemini-code-assist · 2026-02-27T04:11:43Z

src/data_cell/sparse_term_datacell.cpp

    }

-    for (; id < dists_size; id++) {
+    for (; id < total_count_; id++) {


Similar to the previous instance, updating the loop bound from dists_size to total_count_ here is essential for correctness. It ensures that the iteration for inserting into the heap does not go beyond the actual number of elements, preventing potential memory access issues and ensuring accurate processing.

gemini-code-assist · 2026-02-27T04:11:43Z

src/data_cell/sparse_term_datacell.cpp

        term_datas_[term].push_back(val);
        term_sizes_[term] += 1;
    }
+    total_count_++;


Incrementing total_count_ after inserting a vector is a necessary addition. This ensures that the total_count_ member accurately reflects the number of vectors stored, which is then correctly used as the iteration limit in search functions like InsertHeapByDists.

gemini-code-assist · 2026-02-27T04:11:43Z

src/data_cell/sparse_term_datacell.h


    Allocator* const allocator_{nullptr};
+
+    int64_t total_count_{0};


The introduction of total_count_ as a member variable is a necessary change to correctly track the number of elements. This variable is crucial for the updated logic in sparse_term_datacell.cpp where it is used to define accurate iteration bounds.

codecov · 2026-02-27T07:50:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             0.16    #1609      +/-   ##
==========================================
+ Coverage   92.21%   92.37%   +0.15%     
==========================================
  Files         295      295              
  Lines       15677    15685       +8     
==========================================
+ Hits        14457    14489      +32     
+ Misses       1220     1196      -24

Flag	Coverage Δ
cpp	`92.37% <100.00%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
common	`92.73% <ø> (+0.11%)`	⬆️
datacell	`92.49% <100.00%> (+0.59%)`	⬆️
index	`91.47% <66.66%> (-0.02%)`	⬇️
simd	`100.00% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31735f3...6c25fff. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LHT129

LGTM

wxyucs

lgtm

fix(sindi): fix error when neither doc nor query is relevant (#1589)

6c25fff

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>

inabao self-assigned this Feb 27, 2026

inabao requested review from LHT129, ShawnShawnYou and wxyucs as code owners February 27, 2026 04:09

inabao added kind/bug Something isn't working version/0.16 labels Feb 27, 2026

pull-request-size bot added the size/S label Feb 27, 2026

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

LHT129 approved these changes Feb 28, 2026

View reviewed changes

wxyucs approved these changes Feb 28, 2026

View reviewed changes

inabao merged commit 3dec5ba into 0.16 Feb 28, 2026
23 checks passed

inabao deleted the cherry-pick-488225a-fix branch February 28, 2026 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sindi): fix error when neither doc nor query is relevant (#1589)#1609

fix(sindi): fix error when neither doc nor query is relevant (#1589)#1609
inabao merged 1 commit into0.16from
cherry-pick-488225a-fix

inabao commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

gemini-code-assist bot Feb 27, 2026

Uh oh!

codecov bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

LHT129 left a comment

Uh oh!

wxyucs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Allocator* const allocator_{nullptr};

		int64_t total_count_{0};

Conversation

inabao commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LHT129 left a comment

Choose a reason for hiding this comment

Uh oh!

wxyucs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 27, 2026 •

edited

Loading