-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Labels
kind/bugSomething isn't workingSomething isn't working
Description
Detail Bug Report
Summary
- Context: The
chunkedVectorDBdatabase implementation manages RAG document chunks with embeddings, using SQLite with a foreign key constraint for automatic cascade deletion. - Bug: Foreign key constraints are not enabled in SQLite, causing
ON DELETE CASCADEto silently fail and leave orphaned chunk records in the database. - Actual vs. expected: When deleting a file's metadata, orphaned chunks remain in the database instead of being automatically deleted; when re-indexing a file with fewer chunks, stale chunks from the previous version persist in the database.
- Impact: Search queries return incorrect, outdated content from previous file versions, and the database accumulates unbounded bloat as files are repeatedly re-indexed.
Code with bug
func newChunkedVectorDB(dbPath string, vectorDimensions int, strategyName string) (*chunkedVectorDB, error) {
if err := ensureDir(dbPath); err != nil {
return nil, fmt.Errorf("failed to create database directory: %w", err)
}
db, err := sql.Open("sqlite", dbPath+"?_pragma=busy_timeout(5000)&_pragma=journal_mode(WAL)")
if err != nil {
return nil, fmt.Errorf("failed to open database: %w", err)
}
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
db.SetConnMaxLifetime(0)
// <-- BUG 🔴 Missing: PRAGMA foreign_keys = ON
tablePrefix := sanitizeTableName(strategyName)
// ... rest of initialization
}Evidence (Example)
Consider re-indexing a file that shrinks from 5 to 3 chunks:
- Initial state:
/docs/example.mdhas 5 chunks (0–4); both files and chunks tables are populated. - Delete file metadata expecting cascade:
DELETE FROM files WHERE source_path = '/docs/example.md'.- Expected: all 5 chunks are removed via CASCADE.
- Actual: chunks remain because foreign keys are disabled.
- Re-index inserts the file row again and upserts chunks 0–2. Chunks 3–4 from the old version remain.
- Querying joins still return stale content because the
source_pathmatches the re-created file row:
SELECT c.chunk_index, c.content
FROM chunks c
JOIN files f ON c.source_path = f.source_path
ORDER BY c.chunk_index;Observed output (expected 3, got 5):
[0] v2_chunk_0
[1] v2_chunk_1
[2] v2_chunk_2
[3] v1_chunk_3
[4] v1_chunk_4
Recommended fix
Enable foreign key enforcement immediately after opening the SQLite connection so ON DELETE CASCADE takes effect:
func newChunkedVectorDB(dbPath string, vectorDimensions int, strategyName string) (*chunkedVectorDB, error) {
if err := ensureDir(dbPath); err != nil {
return nil, fmt.Errorf("failed to create database directory: %w", err)
}
db, err := sql.Open("sqlite", dbPath+"?_pragma=busy_timeout(5000)&_pragma=journal_mode(WAL)")
if err != nil {
return nil, fmt.Errorf("failed to open database: %w", err)
}
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
db.SetConnMaxLifetime(0)
// Enable foreign key constraints (required for CASCADE to work)
// <-- FIX 🟢
if _, err := db.Exec("PRAGMA foreign_keys = ON"); err != nil {
db.Close()
return nil, fmt.Errorf("failed to enable foreign keys: %w", err)
}
// ... rest of initialization
}Apply the same change wherever SQLite connections are created for this strategy, including pkg/rag/strategy/semantic_embeddings_database.go.
Metadata
Metadata
Assignees
Labels
kind/bugSomething isn't workingSomething isn't working