Fix critical bugs and import organization issues from bot reviewer feedback#15
Conversation
Co-authored-by: JeffCarpenter <178424+JeffCarpenter@users.noreply.github.com>
Co-authored-by: JeffCarpenter <178424+JeffCarpenter@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR addresses critical bugs and code organization issues identified by automated bot reviewers, focusing on two main areas: fixing data loss in chunking logic and preventing runtime errors in build file analysis.
- Fixed chunking logic that was dropping final chunks shorter than target length
- Fixed invalid path usage in starred embeddings that would cause runtime errors
- Reorganized import statements and modernized code patterns
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| github_to_sqlite/utils.py | Fixed build file path validation and added error handling for starred repositories |
| github_to_sqlite/simple_chunker.py | Fixed chunking logic to include final chunks and reorganized typing imports |
| github_to_sqlite/cli.py | Moved typing imports to module level following Python best practices |
| github_to_sqlite/build_files.py | Optimized loop pattern using list extend with generator expression |
| BOT_REVIEW_FIXES.md | Added documentation summarizing all fixes applied |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| for full in pathlib.Path(path).rglob(pattern) | ||
| if full.is_file() | ||
| ) | ||
| return _post_process_build_files(found, path) |
There was a problem hiding this comment.
The function still calls _post_process_build_files() but this import was removed from utils.py. This will cause an ImportError when find_build_files() is called.
There was a problem hiding this comment.
Fixed the ImportError by updating the test file to import _post_process_build_files directly from github_to_sqlite.build_files instead of github_to_sqlite.utils. The function was always defined in build_files.py but the test was importing it from the wrong module after the import cleanup. Commit 3d4c0d2.
Co-authored-by: JeffCarpenter <178424+JeffCarpenter@users.noreply.github.com>
b1f1805
into
hefgje-codex/plan-feature-to-infer-embeddings-for-starred-repos
This PR addresses critical bugs and code organization issues identified by automated bot reviewers (Copilot and Sourcery) in PR #13.
Critical Bug Fixes
1. Fixed chunking logic that dropped final chunks
The
SimpleChunker._chunk()method was incorrectly skipping the final chunk if it was shorter than the target length:This bug caused data loss when processing README content, as the final sentences would be completely omitted from embeddings.
2. Fixed invalid path usage in starred embeddings
The
generate_starred_embeddings()function was callingfind_build_files(repo["full_name"])whererepo["full_name"]is a GitHub identifier like"owner/repo", not a local filesystem path. This would cause runtime errors when processing starred repositories:The fix safely skips build file analysis for remote repositories while preserving functionality for local ones.
3. Fixed ImportError in test file
Updated test imports to correctly import
_post_process_build_filesfromgithub_to_sqlite.build_filesinstead ofgithub_to_sqlite.utilsafter import reorganization.Code Organization Improvements
from typing importstatements that were incorrectly placed inside functions incli.pyTYPE_CHECKINGimport to be grouped with other typing imports insimple_chunker.py.format()with f-string where flagged by bots_post_process_build_filesimport from utils.pyVerification
All changes have been verified with:
py_compileThese fixes eliminate potential data loss and runtime errors while improving code quality and following Python best practices.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.