fix(resize): ensure size is updated after successful expansion#1649
fix(resize): ensure size is updated after successful expansion#1649
Conversation
Move max_capacity_ update to after all expansion operations complete. This prevents inconsistent state where size is updated but memory allocation failed in subsequent resize operations. Fixes resize ordering in: - HGraph::resize - GraphDataCell::Resize - FlattenDataCell::Resize - ExtraInfoDataCell::Resize - SparseVectorDataCell::Resize Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: jinjiabao.jjb <jinjiabao.jjb@antgroup.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of resize operations across various data structures by reordering the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a potential state inconsistency issue by moving the update of max_capacity_ to after all resize and allocation operations have completed successfully, ensuring capacity is updated only when storage is successfully expanded. However, a potential integer truncation issue was identified in HGraph::resize where a 64-bit size is truncated to a 32-bit InnerIdType, which could lead to data corruption if the index grows beyond 4 billion elements. Additionally, there are concerns regarding exception safety in some complex resize operations, which could leave objects in an inconsistent state if a failure occurs midway.
| if (this->extra_infos_ != nullptr) { | ||
| this->extra_infos_->Resize(new_size_power_2); | ||
| } | ||
| this->max_capacity_.store(new_size_power_2); |
There was a problem hiding this comment.
A critical issue exists where the uint64_t variable new_size_power_2 is truncated when stored into this->max_capacity_ (an InnerIdType, 32-bit). If new_size_power_2 exceeds 2^32 - 1, this truncation will lead to an inconsistent state, incorrect capacity reporting, and potential ID collisions and data corruption. A check should be added to prevent this. Furthermore, while the max_capacity_ update is better placed, this function lacks strong exception safety. If Resize calls (e.g., bottom_graph_->Resize) or std::make_shared throw exceptions, the HGraph object can be left in a partially modified, inconsistent state. To ensure atomicity, consider performing all fallible allocations and resizes on temporary objects and swapping them only upon success.
if (new_size_power_2 > std::numeric_limits<InnerIdType>::max()) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "new_size exceeds maximum capacity of InnerIdType");
}
this->max_capacity_.store(static_cast<InnerIdType>(new_size_power_2));| node_versions_.resize(new_size); | ||
| } | ||
| this->max_capacity_ = new_size; | ||
| uint64_t io_size = static_cast<uint64_t>(new_size) * static_cast<uint64_t>(code_line_size_); | ||
| uint8_t end_flag = | ||
| 127; // the value is meaningless, only to occupy the position for io allocate | ||
| this->io_->Write(&end_flag, 1, io_size); | ||
| this->max_capacity_ = new_size; |
There was a problem hiding this comment.
Similar to other Resize functions in this PR, moving the max_capacity_ update is a good fix. However, the function is still not fully exception-safe. If this->io_->Write fails after node_versions_.resize(new_size) has succeeded, the GraphDataCell object will be left in an inconsistent state (node_versions_ is resized but the underlying storage via io_ is not).
For stronger exception safety, you could write to a new I/O target and then switch to it, or find a way to rollback the node_versions_ resize on I/O failure.
| this->io_->Write(&end_flag, 1, io_size); | ||
| this->offset_io_->Write(&end_flag, 1, new_capacity * sizeof(uint32_t)); | ||
| this->max_capacity_ = new_capacity; |
There was a problem hiding this comment.
While moving the max_capacity_ update to the end is correct, this Resize operation is not atomic. If the second write to this->offset_io_ fails, the main data this->io_ has already been modified. This leaves the SparseVectorDataCell in an inconsistent state.
To ensure atomicity, you could consider a two-phase commit-like approach where changes are prepared and then applied, or use a journal to be able to roll back on failure.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## 0.16 #1649 +/- ##
==========================================
- Coverage 92.31% 92.28% -0.03%
==========================================
Files 295 295
Lines 15686 15686
==========================================
- Hits 14480 14476 -4
- Misses 1206 1210 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
cp #1642 to 0.16
link: #1643
Move max_capacity_ update to after all expansion operations complete. This prevents inconsistent state where size is updated but memory allocation failed in subsequent resize operations.
Fixes resize ordering in: