Skip to content

fix(hnsw): fix concurrent access to max_level_ in HNSW search operations#1573

Merged
inabao merged 1 commit intomainfrom
fix/hnsw-concurrent-maxlevel
Feb 10, 2026
Merged

fix(hnsw): fix concurrent access to max_level_ in HNSW search operations#1573
inabao merged 1 commit intomainfrom
fix/hnsw-concurrent-maxlevel

Conversation

@inabao
Copy link
Copy Markdown
Collaborator

@inabao inabao commented Feb 10, 2026

Fixed race condition where max_level_ was accessed outside the protection of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:

  1. A search thread reads max_level_ without holding the lock
  2. An add thread concurrently increases max_level_ and adds new nodes
  3. The search thread may traverse uninitialized or invalid node IDs, causing 'cand error' exceptions

Changes:

  • In searchKnn(): Read max_level_ inside the lock scope into a local copy
  • In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios

Fixed race condition where max_level_ was accessed outside the protection
of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:
1. A search thread reads max_level_ without holding the lock
2. An add thread concurrently increases max_level_ and adds new nodes
3. The search thread may traverse uninitialized or invalid node IDs,
   causing 'cand error' exceptions

Changes:
- In searchKnn(): Read max_level_ inside the lock scope into a local copy
- In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically
under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios
Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @inabao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical concurrency issue within the HNSW algorithm's search functions. By ensuring that the max_level_ variable is read under a shared lock, it eliminates a race condition that could cause search operations to access uninitialized or invalid data when new elements are concurrently added. This change significantly improves the stability and correctness of the HNSW index under high-concurrency workloads.

Highlights

  • Race Condition Fix: Addressed a race condition in HNSW search operations where max_level_ was accessed without proper mutex protection, potentially leading to 'cand error' exceptions during concurrent add and search scenarios.
  • Atomic Access to max_level_: Ensured that max_level_ is read atomically within the max_level_mutex_ scope in both searchKnn() and searchRange() functions by introducing a local copy, thereby preventing inconsistent state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/algorithm/hnswlib/hnswalg.cpp
    • Modified searchKnn() to read max_level_ into a local variable max_level_copy within the max_level_mutex_ lock scope, and subsequently use this local copy.
    • Modified searchRange() to read max_level_ into a local variable max_level_copy within the max_level_mutex_ lock scope, and subsequently use this local copy.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition in the searchKnn and searchRange functions of the HNSW implementation. Previously, max_level_ was accessed outside the protection of max_level_mutex_, which could lead to inconsistent state and exceptions during concurrent add and search operations. The fix involves reading max_level_ into a local variable within the same critical section where enterpoint_node_ is read. This ensures both values are retrieved atomically. The changes are correct, minimal, and effectively resolve the identified concurrency issue.

Copy link
Copy Markdown
Collaborator

@LHT129 LHT129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wxyucs
Copy link
Copy Markdown
Collaborator

wxyucs commented Feb 10, 2026

@copilot, please help to review this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 10, 2026

@wxyucs I've opened a new pull request, #1574, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Collaborator

@wxyucs wxyucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

@@            Coverage Diff             @@
##             main    #1573      +/-   ##
==========================================
- Coverage   91.24%   91.01%   -0.23%     
==========================================
  Files         329      329              
  Lines       19396    19398       +2     
==========================================
- Hits        17697    17656      -41     
- Misses       1699     1742      +43     
Flag Coverage Δ
cpp 91.01% <100.00%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 85.81% <ø> (ø)
datacell 91.70% <ø> (-2.08%) ⬇️
index 90.65% <100.00%> (+0.03%) ⬆️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dd2923...321aecb. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@inabao inabao merged commit 765e6cb into main Feb 10, 2026
29 of 30 checks passed
@inabao inabao deleted the fix/hnsw-concurrent-maxlevel branch February 10, 2026 09:52
wxyucs pushed a commit that referenced this pull request Feb 11, 2026
…ons (#1573)

Fixed race condition where max_level_ was accessed outside the protection
of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:
1. A search thread reads max_level_ without holding the lock
2. An add thread concurrently increases max_level_ and adds new nodes
3. The search thread may traverse uninitialized or invalid node IDs,
   causing 'cand error' exceptions

Changes:
- In searchKnn(): Read max_level_ inside the lock scope into a local copy
- In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically
under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
wxyucs pushed a commit that referenced this pull request Feb 27, 2026
…ons (#1573)

Fixed race condition where max_level_ was accessed outside the protection
of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:
1. A search thread reads max_level_ without holding the lock
2. An add thread concurrently increases max_level_ and adds new nodes
3. The search thread may traverse uninitialized or invalid node IDs,
   causing 'cand error' exceptions

Changes:
- In searchKnn(): Read max_level_ inside the lock scope into a local copy
- In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically
under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
wxyucs pushed a commit that referenced this pull request Feb 27, 2026
…ons (#1573)

Fixed race condition where max_level_ was accessed outside the protection
of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:
1. A search thread reads max_level_ without holding the lock
2. An add thread concurrently increases max_level_ and adds new nodes
3. The search thread may traverse uninitialized or invalid node IDs,
   causing 'cand error' exceptions

Changes:
- In searchKnn(): Read max_level_ inside the lock scope into a local copy
- In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically
under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
wxyucs pushed a commit that referenced this pull request Feb 27, 2026
…ons (#1573)

Fixed race condition where max_level_ was accessed outside the protection
of max_level_mutex_ in searchKnn() and searchRange() functions.

The issue occurred when:
1. A search thread reads max_level_ without holding the lock
2. An add thread concurrently increases max_level_ and adds new nodes
3. The search thread may traverse uninitialized or invalid node IDs,
   causing 'cand error' exceptions

Changes:
- In searchKnn(): Read max_level_ inside the lock scope into a local copy
- In searchRange(): Read max_level_ inside the lock scope into a local copy

This ensures both enterpoint_node_ and max_level_ are read atomically
under the same lock protection, preventing inconsistent state.

Fixes: cand error in concurrent add/search scenarios

Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants